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CHAPTER  1 
OVERVIEW 


This  document  describee  the  algorithms  and  mechanisms  of  the  model 
Processor,  Which  is  a  software  system  performing  a  program  writing 
function.  The  MODEL  Processor  (hereafter  called  the  Processor)  has  been 
designed  to  automate  the  program  design,  coding  and  debugging  of 
software  development,  based  on  a  non-procedural  specifications  of  a 
program  module  in  the  MODEL  language.  As  shown  in  Figure  1.1,  a  program 
module  is  formally  described  and  specified  in  the  MODEL  language,  whose 
statements  are  then  submitted  to  the  Processor.  The  set  of  MODEL 
statements  describing  a  program  nodule  is  referred  to  as  a 
specification .  The  Processor,  performs  the  analysis  (including  checking 
for  the  completeness  and  consistency  of  the  entire  specification), 
program  module  design  (including  generating  a  flowchart- like  sequence  of 
events  for  the  module),  and  code  generation  functions,  thus  replacing 
the  tasks  of  an  application  programmer/coder.  The  Processor's 
capability  to  process  a  non-procedural  specification  language  is  built 
on  application  of  graph  theory  to  the  analysis  of  such  specification  and 
to  the  program  generation  task. 

Another  important  function  of  the  Processor  is  to  interact  with  the 
specifier  to  indicate  necessary  supplements  or  changes  to  the  submitted 
statements . 

The  Processor  produces  a  complete  PV1  program  ready  for 
compilation  as  well  as  various  reports  concerning  the  specification  and 
the  generated  program.  The  Processor  output  reports  include  a  listing 
of  the  specification,  a  cross-reference  report,  subscript  range  report, 
a  flowchart-like  report  of  the  generated  program,  and  a  listing  of  the 
generated  program,  all  to  be  described  fully  later. 
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Figure  1.1  The  Overall  Procedure  For  Use  o£  MODEL 

Processing  o£  a  specification  written  in  MODEL  by  the  Processor 
consists  of  four  phases  shown  in  the  system  flowchart  of  Figure  1.2, 
which  is  the  first  refinement  of  Figure  1.1.  Some  of  these  phases 
represent  adaptations  of  known  but  state-of-the-art  technology,  while 
other  phases  involve  more  novel  innovations  in  analysis  of  the 
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specification  and  in  the  design  and  code  generation  for  the  application 
program. 

Each  of  the  four  phases  depicted  in  Figure  1.2  is  discussed  below. 
Phases  1»  Syntax  Analysis  of  the  MODEL  Module  Specification 

In  this  phase,  the  provided  MODEL  specification  is  analyzed  to  find 
syntactic  and  some  semantic  errors.  This  phase  of  the  Processor  is 
itself  generated  automatically  by  a  meta-processor  called  a  Syntax 
Analysis  Program  Generator  (SAPG),  whose  input  is  syntax  rules  provided 
through  a  formal  description  of  the  MODEL  language  in  the  EBNF  language 
(yet  to  be  discussed).  In  this  manner,  changes  to  the  syntax  of  MODEL 
during  development  can  be  mode  more  easily. 
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Figure  1.2  Phases  of  the  MODEL  IX  Processor 

A  further  task  of  this  chase  is  to  store  the  statements  in  a 
simulated  associative  memory  for  ease  in  later  search,  analysis,  and 
processing.  Some  needed  corrections  and  warnings  of  possible  errors  are 
also  produced  in  a  report  for  the  user.  Also,  a  cross-reference  report 
is  produced. 
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A  description  of  the  syntax  and  statement  analysis  phase  is  covered 
in  detail  in  Chapter  3. 


Phase  2 i  Analysis  of  MODEL  Specification 

In  this  phase,  precedence  relationships  between  statements  are 
determined  from  analysis  of  the  MODEL  data  and  assertion  statements. 
Ths  specification  is  analyzed  to  determine  the  consistency  and 
completeness  of  teh  statements.  Each  MODEL  statement  may  be  considered 
to  be  an  independent  stand-alone  statement.  The  order  of  the  user's 
statement  is  of  no  consequence.  However,  in  analysis  of  the  statements, 
precedences  relationships  are  determined  based  on  statement  components. 
These  relationships  are  used  to  form  the  nodes  and  directed  edges  of  an 
array  graph  (yet  to  be  discussed)  on  which  completeness,  consistency, 
ambiguity,  and  feasibility  of  constructing  a  program  can  be  checked, 
various  omissions  or  errors  are  corrected  automatically,  especially  in 
connection  with  use  of  subsetipts.  Reports  are  produced  for  the  user 
indicating  the  data,  assertions,  or  decisions  that  have  been  made  by  the 
Processor,  or  contradictions  that  have  been  found.  In  addition,  a 
report  showing  the  range  of  each  subscript  is  generated. 

Explanation  of  this  process  is  covered  in  Chapter  4  and  5. 


Phase  3:  Automatic  Program  Design  and  Generation  of  Sequence  and 
Control  Logic 

This  phase  of  the  Processor  determines  the  sequence  of  execution  of 
all  events  and  iterations  implied  by  the  specification,  using  graph 
theory  techniques.  It  determines  also  the  sequence  and  control  logic  of 
the  desired  program.  The  result  of  this  phase  is  a  flow  of  events, 
sequenced  in  the  order  of  execution.  Thus,  the  output  of  this  phase  is 
similar  to  a  program  flowchart-like  report.  At  the  end  of  this  phase  it 
is  also  possible  to  produce  a  formatted  report  of  the  specification. 

This  phase  is  presented  in  detail  in  Chapter  6. 


Phase  48  Code  Generation 

AT  this  point  in  the  process  it  is  necessary  to  generate,  tailor, 
and  insert  the  code  into  the  entries  of  the  flowchart  to  produce  the 
program.  In  particular,  read  and  write  input/output  cosnands  are 
generated  whenever  the  flowchart  indicates  the  need  for  moving  records. 
The  assertions  are  developed  into  PL/1  assignment  statements.  Eherever 
program  iterations  and  other  control  structures  are  necessary,  program 
code  for  them  is  generated.  Declarations  for  object  program  data 
structures  and  variables  are  generated.  Code  is  also  generated  for 
recovery  from  program  failures  when  bad  data  is  encountered  during 
program  execution.  The  product  of  this  phase  is  a  complete  program  in  a 
high  level  language,  PL/1,  ready  for  compilation  and  execution.  A 
listing  of  the  generated  program  is  produced. 


The  remainder  of  this  report  expanda  on  the  above  mentioned  phases. 
Chapter  2  discusses  the  syntax  and  semantics  of  each  type  of  MODEL 
statements.  Figure  1.3  provides  a  tree  diagram  of  the  major  modules. 
The  name  of  the  modules  in  this  diagram  are  referenced  throughout  the 
remainder  of  this  report  wherever  the  corresponding  task  is  explained. 
As  seen  at  the  top  of  Figure  1.3,  a  MONITOR  governs  the  execution  of  the 
different  phases  of  the  Processor,  and  does  not  allow  succeeding  phases 
to  proceed  without  the  success  of  the  previous  phases .  At  the  second 
level  of  Figure  1.3,  the  major  phases  of  the  Processor  are  named  (1)  SAP 
(Syntax  Analysis  Program),  Chapter  3/  (2)  netgen  (Network  Generation) 
and  NETANAL  (Network  Analysis),  Chapter  4  and  5;  (3)  SCHEDULE  (Schedule 
events  and  generate  flowchart).  Chapter  6;  and  (4)  CODEGEN  (Code 
Generation),  Chapter  7.  Below  this  level  of  Figure  1.3,  the  diagram 
shows  the  names  of  the  modules  subordinate  to  each  of  these  phases. 
Each  of  these  subroutines  is  discussed  throughout  this  report. 


SYNTAX  AND  SEMANTICS  OF  THE  MODEL  LANGUAGE 


2.1  STRUCTURE  OF  A  PROGRAM  SPECIFICATION 

A  program  specification  written  in  the  MODEL  language  consists  of 
three  major  parts :  program  header,  data  description,  and  assertions. 

The  program  header  specifies  the  name  of  the  program  and  the  external 
files  Which  store  the  input  or  output  data  of  the  program.  The  data 
description  statements  are  used  to  specify  the  data  structure  of  the 
input  or  output  files  and  the  structure  of  the  intermediate  results. 
The  assertions  are  used  to  define  the  values  of  the  intermediate  or 
output  variables  specified  in  the  data  description  statements.  Although 
the  user  is  encouraged  to  group  statements  together  and  order  the  parts 
in  the  sequence  mentioned  above,  the  statements  in  a  program 
specification  can  be  put  in  any  order,  i.e.  the  order  of  the  statements 
is  irrelevant  to  the  meaning  of  the  specification.  That  is  one  reason 
why  we  call  MODEL  a  non-procedural  programming  language.  In  this 
section  we  discuss  the  statements  in  the  program  header .  We  will 
discuss  in  section  2.2  the  data  description  statements,  and  in  section 
2.3  the  syntax  and  the  semantics  of  the  assertions.  We  will  discuss  in 
section  2.4  the  use  of  control  variables . 

Only  the  basic  MODEL  language  is  described  here.  Short-hand  and 
high  level  dialects  are  not  described  as  they  axe  always  trams lated 
automatically  into  the  basic  language.  The  syntax  rules  of  the  MODEL 
statements  will  be  defined  with  extended  BNF  notation.  Identifiers 
enclosed  by  the  angle  brackets  ('<'  and  ’>')  are  non-terminal  symbols. 
The  metasymbols  used  include: 

1.  n«,  it  is  read  as  '  is-defined-by' . 

2.  [...],  a  pair  of  square  brackets  is  used  to  enclose  a  string  which  is 
optional. 

3 .  | ,  a  vertical  bar  is  used  to  separate  alternatives . 

4.  {...}*,  a  pair  of  braces  followed  by  an  asterisk  is  used  to  enclose  a 
string  which  can  repeat  any  times  (including  zero). 


The  program  header  consists  of  three  types 
the  module  statement,  the  source  file  states 
statement . 


of  statements,  namely 
ant,  and  the  target  file 


jlodulg  Statement 


The  syntax  rule  for  the  module  statement  is  as  follows . 

< module-statement > t i- 

MODULE  i  <  identifier*  ; 

The  user-chosen  identifier  is  used  as  the  name  of  the  program  being 
specified. 

Source  File  Statement 

The  syntax  rule  for  the  source  file  statement  is  as  follows. 

< source— file-statement* t s- 

■300RCE  C  FILES  |  FILE  ]  i  < identifier*  {  ,  « identifier*  )*  ; 

The  source  file  statement  consists  of  a  list  names  of  files  which 
serve  as  the  input  files  of  the  program.  The  source  files  are  assumed 
stored  in  external  storage  devices. 

Target  File  Statement 

The  syntax  rule  for  the  target  file  statement  is  as  follows. 

* target-file-statement* * i- 

TARGET  [  FILES  |  FILE  ]  t  < identifier*  {  ,  < identifier*  }*  ; 

The  target  file  statement  lists  the  names  of  files  Which  serve  as 
the  output  files  of  the  program.  The  output  files  are  assumed  to  be  on 
external  storage  and  they  serve  to  retain  the  computation  result  for 
future  use. 


2.2  DATA  DESCRIPTION  STATEMENTS 

In  a  non-procedural  programming  language  every  variable  can  only 
have  a  single  value.  Therefore,  different  variable  names  should  be 
declared  for  different  data  involved  in  the  confutation.  The  data 
structures  in  external  files,  or  the  schemata  of  files,  can  be  described 
in  MODEL  with  data  description  eatatements.  Logically  related  variables 
may  also  be  grouped  together  as  in  PL/ I .  The  user  must  also  declare  the 
data  types  of  the  components  of  a  variable  in  data  description 
statements.  The  MODEL  language  has  been  designed  to  relieve  the  user  of 
concern  for  I/O  control.  In  general,  I/O  can  be  a  complicated  part  of  a 
programming  language.  A  few  simple  mechanisms  have  been  included  in  the 
data  description  statements  to  ease  the  I/O  progranBing  task.  Examples 
include  the  ability  to  describe  file  organization  and  to  indicate  a  key 
field  for  direct  accessing  a  record.  In  section  2.2.1  we  will  discuss 
the  way  to  specify  the  data  type  of  a  variable;  in  section  2.2.2,  the 
way  to  describe  data  aggregates;  and  in  section  2.2.3,  the  mechanisms 
used  for  I/O  related  programming. 


2.2.1  DATA  TYPES 


Th*  smallest  unit  of  data  in  a  program  is  a  field.  A  field  may 
contain  a  datum  of  some  type  supported  by  the  MODEL  language.  The 
available  data  types  includes  picture,  character,  bit  string,  and 
nusbers.  It  is  the  user's  responsibility  to  select  a  data  type  for  each 
field. 

field  Declaration  Statement 

The  syntax  rule  for  a  field  declaration  statement  is  as  follows. 

< field-declaration-statement >  t 

< identifier  [  IS  ]  < field >  <data-type>  ; 

< field*  jj-  FLD  |  FIELD 
< data- type*  : s-  <type>  < leng— spec* 

<leng-spec>  t (  <min- length*  [  t  <maac- length*  ]  ) 

<min- length*  : « integer* 

<type> i <pic-desc>  f  < string-spec*  |  <num-spec* 

<pic-desc>  * !-  <pic-type>  •  < string*  • 

< pic— type*  s i -  PIC  |  PICTURE 

< string-spec >  : s-  CHAR  |  CHARACTER  |  BIT  |  NUM  |  NUMERIC 
<nunt-spec*  1 1-  < num— type >  [  <fixflt>  ] 

<num-type>  : BIN  |  BINARY  |  DEC  |  DECIMAL 
<fixflt>  I i-  FIX  |  FIXED  |  PL  |  FLOAT  |  FLT 
<max- length*  : i«  < integer* 

A  character  string  may  be  of  fixed  length  or  variable  length.  For 
a  fixed  length  character  string  the  length  in  byte  units  should  be 
specified  in  the  type  declaration.  A  variable  length  character  string 
is  specified  through  declaring  the  range  of  the  possible  length  of  the 
string.  When  a  field  X  of  variable  length  string  occurs  in  an  input 
file,  its  length  should  be  specified  by  an  associated  control  variable 
called  LEN.x. 

Examples 

A  IS  FIELD  CKAR( 6 )  > 

B  IS  FIELD  CHAR(Ot 10); 

The  field  A  is  a  string  of  six  characters  and  the  field  B  is  a 
variable  length  character  string  with  maximum  length  ten.  The  actual 
length  of  the  field  B  should  be  specified  by  a  control  variable  called 
LEN.B  in  soma  assertion. 

The  available  operations  for  manipulating  character  strings  include 
lexicographic  comparison,  concatenation,  and  extracting  substring.  The 
discussion  for  the  character  string  is  also  applicable  to  the  bit  string 
data  type. 

The  data  types  for  numeric  data  include  picture,  floating  point 
decimal,  floating  point  binary,  fixed  point  decimal,  and  fixed  point 
binary.  The  operations  applicable  to  numeric  data  are  arithmetic 
operations,  comparison,  and  conditional  definition.  It  should  be  noted 
that  the  picture  and  character  typed  variables  have  a  printable 
representation.  Therefore,  it  is  suitable  for  data  contained  in 
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reports,  other  numeric  data  types  are  generally  used  for  the  data 
stored  in  the  computer  system.  The  PI*/ 1  target  language  incorporate 
extensive  type  conversion  and  therefore  the  user  is  generally  relieved 
of  this  concern. 


2.2.2  DATA  STRUCTURES 

usually  there  are  two  ways  to  group  logically  related  data  together 
to  form  data  structure.  An  array  contains  homogeneous  data  elements  and 
a  structure  contains  heterogeneous  data  elements.  In  MODEL  a 
generalized  data  aggregate  can  be  used  to  specify  arrays  and  structures. 
The  data  aggregate  is  called  a  group  or  a  record  in  MODEL  language. 

Group  Declaration  Statement 

The  syntax  rule  for  the  group  declaration  statement  is  as  follows. 

< group-declaration-statement >  < i» 

< identifier)  [  IS  ]  <group>  (  < member- list >  )  ; 

< group>  t ««  GRP  |  GROUP 

< member- list)  i s-  <member>  {  ,  <  member)  )* 

< member)  1 1«  < identifier)  [  (  < occspec >  )  ] 

<occspec>  s *  |  <minocc>  [  j  <maxocc>  ] 

<minocc>  t s-  < integer) 

<maxocc>  : < integer) 

In  the  group  declaration  statement  an  identifier  is  declared  as  a 
data  group  which  contains  a  list  of  members.  Each  member  may  optionally 
repeat  some  number  of  times.  If  a  member  repeats,  it  is  considered  as 
an  array  of  one  dimension  more  than  the  group  containing  it.  There  are 
three  ways  to  specify  the  number  of  repetitions  over  a  dimension  of  an 
array .  if  the  number  of  repetitions  is  a  constant,  then  the  constant 
can  be  specified  along  with  the  array  name.  When  the  number  of 
repetitions  is  not  fixed  but  the  user  knows  the  maximum  of  it,  he  can 
specify  a  range  for  the  number  of  repetitions  in  the  group  statement. 
If  the  user  does  not  know  the  maximum,  i.e.  Where  the  maximum  is  an 
unknown  large  value,  he  can  denote  the  range  by  an  asterisk.  When  the 
number  of  repetitions  is  not  a  constant,  it  can  be  defined  through  some 
control  variables  with  keyword  prefix  such  as  SIZE  or  END  (refer  to 
section  2.4)  or  definition  may  be  omitted  if  it  can  be  detected  based  on 
an  end-of-file  indication. 

The  members  of  a  data  group  can  be  fields,  or  some  other  data 
groups.  A  data  group  may  be  declared  as  an  array  of  arrays.  In  order 
to  reference  a  unit  datum  of  it,  the  user  has  to  supply  as  many 
subscripts  as  the  number  of  array  dimensions.  Thus  the  member  field 
becomes  a  multi-dimensional  array. 


Exas$>let 

A  IS  GROUP  (B,  C(10))  ) 

B  IS  FIELD  CHAR( 6 )  , 

C  IS  GROUP  ( D( 5 ) ,  E( 1 i 50 ) ,  F(*)>  ; 
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wh»r*  identifier  A  la  declared  aa  a  data  group  containing  two 
members  B  and  C.  Let  ua  assume  that  A  la  a  aero  dimensional  v  ax  labia . 
Sinca  C  rapaata,  it  ia  a  ona  diaanaional  array.  Zdantifier  C  contain* 
thraa  members,  o,  E,  and  P.  The  member  D  repeata  five  times,  and  the 
member  e  aay  repeat  a  nuaber  of  tiaea  from  one  to  fifty.  The  member  P 
haa  a  unknown  nuaber  of  repetitions ,  ao  an  aateriek  ia  specified  a a  its 
nuaber  of  repetitions.  All  the  aeabers  of  data  group  C  acre  two 
dimensional  arrays. 


2.2.3  I/O  RELATED  DATA  AGGREGATES 

In  a  MODEL  specification,  the  user  describes  the  structures  of  the 
data  files  with  data  description  statements.  The  MODEL  processor 
generates  I/O  statements  automatically  for  the  source  and  target  files 
of  the  program  based  on  the  information  in  data  description  statements. 

The  record  declaration  statement  is  syntactically  similar  to  the 
group  declaration  statement.  The  only  difference  is  that  the  keyword 
GROUP  is  changed  to  RECORD.  A  record  corresponds  to  a  unit  of  data 
Which  can  be  pt«ysicai.ly  transferred  between  external  file  and  main 
memory. 


The  file  ia  the  highest-level  data  structure  which  could  be 
declared  in  a  MODEL  specification.  It  is  not  allowed  to  have  a 
structure  above  the  file.  A  file  structure  may  consist  of  substructures 
declared  with  group,  record,  or  field  statements.  A  well  structured 
file  declaration  will  have  the  file  entity  on  the  top  level.  Its 
ineediate  descendants  (i.e.  members)  can  be  declared  either  as  groups 
or  records.  The  groups  may  contains  groups,  records,  or  fields. 
Pinally  on  the  lowest  level  in  the  file  structure  the  data  should  be 
declared  as  fields. 

Pile  Declaration  Statement 

The  syntax  rule  for  the  file  declaration  statement  is  as  follows. 

< file-declaration-statement >  is* 

<identi£er>  [  IS  ]  PILE  [  NAME  ]  <file-desc> 

(  <msmber-list>  )  j 
<£ile-desc>  is* 

[  KEY  C  NAME  ]  C  IS  ]  <identifer>  ) 

[  ORG  [  IS  ]  <  org-type  >  ] 

<org-type>  i i-  SAM  |  ISAM 

A  file  may  have  the  KEY  attribute  specified.  In  that  case,  the 
records  in  the  file  are  accessed  by  a  part  of  the  record  contents.  If  a 
file  is  keyed,  there  can  only  be  one  record  type  in  the  file  structure 
and  one  of  the  field  in  the  record  should  be  declared  as  the  key  for 
accessing  the  record.  Two  types  of  file  organization  are  supported  by 
the  MODEL  language,  namely  the  sequential  files  and  the  index  sequential 
files.  A  record  in  an  index  sequential  file  can  be  accessed  faster  than 
in  a  sequential  file  if  direct  accessing  is  necessary. 
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Example: 


MODULE  I  MINSALE ; 

SOURCE:  TRAN,  INVEN; 

TARGET:  SLIP,  INVEN; 

TRAN  IS  PILE  ( SALEREC( * ) ) ; 

SALEREC  IS  RECORD  ( OUSTS , STOCKS , QUANTITY ) > 

OUSTS  IS  FIELD{ CHAR( 5 ) ) ; 

STOCKS  IS  FIELD<CHAR<8)); 

QUANTITY  IS  FIEU>(  CHAR(  3  )  )  ; 

INVEN  IS  FILE  ( INVREC) 

KEY  STOCKS 
ORG  ISAM; 

INVREC  IS  RECORD^  STOCKS , SALPRICE , QOH ) ; 

STOCKS  IS  FIELD(CHAR(8)); 

SALPRICE  IS  FIEID( NUMERIC ( 5 ) ); 

QOH  IS  FIELD( NUMERIC(  5 ) ) ; 

SLIP  IS  FILE  ( SLIPREC( * ) ) ; 

SLIPREC  IS  RECORD  (CUSTS, STOCKS, QUANT, PRICE, CHARGE); 
OUSTS  IS  PLD  ( CHAR(  12 ) ) ; 

STOCKS  IS  FIEIJ3(CHAR(  16)); 

QUANT  IS  FIELD  (PIC*( 11)Z9* ); 

PRICE  IS  FIEID  ( PIC ' ( 11 )Z9 ' ); 

CHARGE  IS  FIELD  <PIC'( 11)Z9* ); 


2 . 3  ASSERTIONS 

Data  description  statements  define  the  data  structures  of  the 
variables  involved  in  a  computation .  However,  the  values  of  the 
variables  are  defined  either  automatically  by  input  files  or  manually  by 
assertions.  Basically  an  assertion  is  an  equation.  On  the  left  hand 
side  of  the  equal  sign  there  should  be  either  a  simple  variable  or  a 
subscripted  array  name  Which  references  an  array  element.  On  the  right 
hand  side  there  can  be  any  arithmetic  or  logical  expression  Whose  value 
is  used  to  define  the  variable  on  the  left  hand  side.  The  current 
restriction  is  that  the  assertion  can  only  be  used  to  define  the  value 
of  a  field.  Operations  on  the  higher  level  data  structures  are  proposed 
to  be  translated  into  basic  operations  [PNPR  80]. 


2.3.1  SIMPLE  AND  CONDITIONAL  ASSERTIONS 

ttiere  are  two  kinds  of  assertions  which  can  be  used  to  define  the 
value  of  a  variable,  namely  simple  assertion  and  conditional  assertion. 
The  assertions  have  the  same  syntax  as  an  assignment  statement  and  a 
conditional  statement  in  the  PL/1  language,  respectively.  All  the 
arithmetic  and  logical  operations  can  be  used  in  composition  of 
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expressions.  In  addition,  the  conditioi>al  expression  of  ALGOL  language 
can  be  used  in  composing  the  expression. 

Simple  Assertion 

The  syntax  rule  for  the  assertion  is  as  follows. 

<assertion>  ti»  < simple-assertion*  |  <conditional-assertion> 
<8imple-assertion>  : »-  <variable>  ■  <expression>  > 

<variable>  i «-  < simple-variable >  |  < subscripted-variable > 

The  variable  name  on  the  left  hand  side  of  an  assertion  is  called 
the  target  variable  of  the  assertion  as  its  value  is  defined  by  the 
assertion.  All  the  variables  on  the  right  hand  side  are  called  the 
source  variables  of  the  assertion  since  their  values  are  used  to 
calculate  the  value  of  the  target  variable.  In  the  examples  shown 
below,  a  conditional  expression  is  used  to  define  the  value  of  variable 
M. 

Example: 

1)  A  -  B  +  5  ; 

2)  X(  I,  J)  -  4  *  I  +  J  , 

3)  M  -  IF  OK  THEN  5  ELSE  0  J 

Conditional  Assertion 

The  syntax  of  the  conditional  assertion  is  similar  to  that  of  an  IF 
statement  in  PL/I . 

< conditional— assert ion>  : :- 

IF  < boo lean— expression*  THEM  < assertion* 

C  ELSE  <assertion>  ] 

The  conditional  assertion  has  two  branches,  one  after  the  keyword  THEN 
and  the  other  after  the  keyword  ELSE.  These  two  branches  are 
selectively  executed  according  to  the  truth  value  of  a  boolean 
expression.  Since  the  purpose  of  an  assertion  is  to  define  the  value  of 
a  variable,  there  can  only  be  one  target  variable  in  an  assertion.  In 
any  case  the  two  branches  should  define  the  same  target  variable. 
Therefore,  the  target  variable  in  any  branch  of  a  conditional  assertion 
should  always  be  the  same.  It  should  be  noted  that  the  ELSE  branch  of  a 
conditional  assertion  is  optional.  if  it  is  omitted,  the  target 
variable  may  be  undefined  in  some  cases. 

Example: 

1 )  IF  I  <  5  THEN  A< I )  -  B( I )  ; 

ELSE  A(I)  -  B(I)  +  2  ; 

2)  IF  END.X(J)  THEN  B  -  X(J)  ; 


2.3.2  SUBSCRIPT  EXPRESSIONS 

The  variables  used  in  assertions  are  either  simple  variables  or 
subscripted  variables.  A  specific  element  of  an  N  dimensional  array  can 
be  referenced  with  the  array  name  followed  by  N  subscript  expressions. 
In  the  following  we  will  discuss  how  the  subscript  expressions  are 


for— d  and  how  thay  are  usad  in  composing  tha  assart ions . 

Subscript  expressions  ara  composed  of  ordinary  variablas,  subscript 
variables,  and  constants  with  arithmetic  operations.  The  subscript 
variable  is  a  special  kind  of  variable.  It  does  not  have  structure  and 
it  does  not  hold  one  specific  value.  Instead,  a  subscript  variable 
assu— s  integer  values  in  a  range  frost  one  up  to  so—  positive  integer. 
If  the  range  for  a  subscript  variable  is  fixed  in  the  whole  program 
specification,  then  the  subscript  variable  is  called  a  global  subscript . 
On  the  other  hand,  if  the  range  for  a  subscript  variable  is  to  be 
determined  for  each  assertion,  the  subscript  variable  is  called  a  local 
subscript.  There  are  ten  system  predefined  local  subscripts  named  SOBl, 
SUB2 ,  . . . ,  up  to  SUB10 .  There  are  two  types  of  global  subscripts .  One 
of  them  has  the  form  of  qualifying  the  name  of  a  repeating  data 
structure  prefixed  with  the  keyword  FOR_EACH.  The  other  is  created  by 
declaring  an  identifier  as  a  global  subscript  with  the  subscript 
state— nt . 

Subscript  Declaration  State— nt 

The  syntax  rule  for  the  subscript  declaration  state— nt  is  as 
follows . 

<  subscript-dec larat ion-statement  >  : 

< identifier*  IS  subscript >  [  (  <occspec>  )  ]  ; 

< subscript*  : s-  SUBSCRIPT  |  SUB 

The  subscript  expressions  axe  classified  into  the  following  types 
according  to  their  fox—.  In  the  following,  let  I  denote  a  subscript 
variable,  c  and  k  denote  non-negative  integers,  and  X  denote  an  Indirect 
indexing  vector^  refer  to  section  4.2.2.Z.  )  Subscript  expressions  — y  be 
classified  as  follows » 

1)  I. 

2)  1-1, 

3)  I-k,  where  k>l, 

4)  none  of  the  other  types, 

5)  X(I) 

6)  X(I-c)-k,  where  c+k-1, 

7)  X(I-c)-k,  where  c+k>l. 

The  range  of  a  global  subscript  variable  in  an  assertion  — y  be 
declared  in  a  subscript  declaration  state— nt.  If  not  declared,  the 
range  is  derived  from  an  array  dimension  in  which  the  subscript  variable 
has  been  used  in  a  type  l,  2,  or  3  subscript  expression. 

Examples 

1)  I  IS  SUBSCRIPT  (10)  ) 

B( I )  -  A(I)  ; 

A  global  subscript  I  is  declared  in  the  subscript  declaration 
state— nt  and  the  range  of  the  value  of  I  is  from  one  to  ten.  In  the 
assertion,  the  global  subscript  I  will  —sum  the  integer  values  in 
the  range  declared  in  the  subscript  declaration  state— nt. 

2)  FRCT(SUBl)  -  IP  SUB1-1  THEM  1 

ELSE  SUB1  *  PRCT(SUBl-l)  ; 
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The  range  of  the  local  subscript  SO 81  will  be  the  sane  as  that 
of  the  first  dimension  of  array  FACT  because  the  subscript  SUBl 
occurred  in  the  tern  FACT(SOBl)  is  in  a  form  of  type  1  subscript 
expression . 

The  use  of  subscript  variables  allows  us  to  define  all  the  elements 
of  an  array  in  one  assertion.  Xn  the  second  example  above,  the  whole 
vector  FACT  is  defined  by  the  same  assertion. 

For  multi-dimensional  arrays,  subscripting  array  variables  may 
becoeie  tedious.  We  have  adopted  the  following  convention  to  allow  users 
to  omit  subscripts  in  array  references.  When  all  the  array  references 
in  an  assertion  have  the  same  leftmost  subscript  expression,  which  is  a 
type  1  subscript  and  when  the  subscript  is  not  otherwise  referred  to  in 
the  assertion,  then  the  subscript  can  be  omitted  from  the  assertion 
systematically.  For  example,  the  following  three  assertions  are 
equivalent . 

all  A(  I,  J,K)  «  2  *  B(I,J,K)  +  C(X,J)  ; 
a2i  A(J,K)  -  2  *  B(J,K)  +  C(J>  ; 
a3 1  A(K)  -  2  *  B(K)  +  C  ; 


2.4  CONTROL  VARIABLES 

Sometimes  it  is  necessary  to  refer  to  attributes  of  the  data,  such 
as  the  number  of  repetitions,  the  length,  or  the  key  for  accessing  a 
record  in  an  index  sequential  file.  In  order  to  allow  reference  to  such 
attributes,  a  number  of  control  variables  are  included  in  the  MODEL 
language.  Since  the  control  variables  are  always  related  to  some 
variable,  they  have  a  form  of  a  qualified  variable,  with  the  name  of  the 
variable  as  the  suffix  and  one  of  several  reserved  keywords  as  the 
prefix.  In  the  following  we  will  assume  that  X  is  a  variable  name 
declared  in  some  data  description  statement.  The  control  variables 
which  can  be  formed  from  X  are  discussed  below. 

SIZE.X 

If  X  is  a  repeating  member  of  some  data  structure,  the  user  can 
specify  the  range  by  defining  the  value  of  a  control  variable  called 
SIZE.X.  It  should  be  noted  that  X  may  be  a  multi-dimensional  array. 
SIZE.X  defines  only  the  range  of  its  rightmost  dimension,  the  ranges  of 
the  other  dimensions  have  to  be  defined  separately. 

SIZE.X  is  a  variable  of  integer  type.  Its  value  is  used  to  specify 
the  number  of  repetitions  of  the  rightmost  dimension  of  array  X.  If 
X( 11,12, ... ,In)  is  an  n  dimensional  array  where  II  occurs  on  the  most 
significant  dimension  and  In  on  the  least  significant  dimension,  then 
the  control  variable  SIZE.X( II, 12, . . . , Ik)  should  be  a  k  dimensional 
array  with  0<«k<n.  The  first  dimension  of  SIZE.X  has  the  same  range  as 
the  first  dimension  of  array  X,  the  second  dimension  has  the  same  range 
as  the  second  dimension  of  array  X,  and  so  on.  The  value  of  SIZE.X 
cannot  be  a  function  of  any  subscript  li  with  k<i<-n.  For  every  n-1 
tuple  ( 11,12, ... ,In-l)  which  corresponds  to  a  possible  combination  of 


-  IS  - 


the  leftmost  n-1  subscripts  for  array  X,  the  number  of  elements  of  axray 
X  with  this  tuple  as  their  leftmost  n-l  subscripts  is  specified  by  the 
array  element  SIZE. X(  11,12, ... ,  Ik). 

Example: 


A  IS  GROUP  ( B( 3 ) )  ; 
B  IS  GROUP  <C<*))  } 
C  IS  FIELD  j 
SIZE ,C( 1 )  =  4  ; 
SIZE.C( 2 )  -  2  ; 
SIZE.C( 3 )  -  3  j 

SIZE.C  C 


I  4  |  |  C(l,l)  |  C(  1, 2 )  |  C(  1, 3  )  |  C(  1,4)  | 

I  2  |  |  C(2,l)  |  C(  2, 2 )  | 

I  3  I  I  C(  3 , 1 )  |  C(  3 , 2  )  |  C(  3, 3  )  | 


In  the  example  above,  array  C  is  two  dimensional.  There  are  three 
instances  of  B  in  data  group  A  and  each  instance  of  B  contains  a  number 
of  elements  of  array  C.  Correspondingly  the  range  of  the  first 
dimension  of  array  C  is  a  constant  three  and  the  range  of  the  second 
dimension  which  may  depend  on  the  subscript  value  of  the  first  dimension 
is  specified  in  vector  SIZE.C.  SIZE.C(l)  equals  to  four  implies  that 
there  are  four  elements  of  array  C  in  the  first  instance  of  B,  the  value 
of  SIZE.C( 2 )  specifies  the  number  of  elements  of  array  c  in  the  second 
instance  of  B,  and  so  on. 


END.X 


If  X  is  a  repeating  member  of  a  data  structure,  END.X  can  be  used 
to  specify  the  range  of  the  rightmost  dimension  of  array  X  as 
alternative  to  the  use  of  SIZE.X. 

END.X  is  a  boo learn  array.  If  X( 11,12, ... ,In)  is  am  n  dimensional 
array,  then  the  associated  control  array  END.X( II, 12, . . . , In )  is  am  n 
dimensional  aurray,  too.  The  range  of  array  dimensions  of  END.X  are  the 
same  as  the  corresponding  array  dimensions  of  X.  The  value  of  END.X 
determines  the  range  of  the  rightmost  dimension  of  array  X  in  the 
following  way.  For  every  n-l  tuple  ( 11,12, ... ,In-l)  which  is  a  possible 
combination  of  the  leftmost  n-l  subscripts  of  axray  X,  there  exists  a 
sequence  of  elements  in  END.X  aLrray  with  the  same  left  n-l  subscript 
values,  i.e.  {END.X( II, . . . , In-1, In)|  1<-In).  If  END.X( II, . . . ,ln-l,m) 
is  a  boolean  true  amd  all  the  elements  of  (END.X( II, . . . , In-1, In)| 
l<»ln<m)  axe  false,  then  there  are  exactly  m  elements  in  aurray  X  with 
( II, . . . ,In-l)  am  their  leftmost  n-l  subscripts.  The  values  in  END.X  may 
depend  on  the  values  in  array  X,  i.e.  the  number  of  repetition  may 
depend  on  the  data  in  X. 
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Examples 

For  the  same  array  C  mentioned  above,  we  may  use  a  two  dimensional 
control  array  END . C  to  specify  the  range  of  the  second  dimension  of 
array  C  as  follows. 


A  IS  GROUP  ( B( 3  )  )  ; 

B  IS  GROUP  (C(*>)  ; 

C  IS  FIELD; 

END . C( SUB1 , SUB2 )  -  IF  SUB1-1  THEN  (SUB2-4) 

ELSE  IF  SUB1-2  THEN  ( SUB2-2 ) 
ELSE  IF  SUB1-3  THEN  ( SUB2-3 )  ; 
C 


1  C(1,1) 

0(1,2) 

C(  1, 3  )  |  0(1,4)  | 

1  c(2,l) 

0(2,2) 

1  0(3,1) 

0(3,2) 

0(3,3)  | 

1  F 

F 

F  |  T  | 

1  F 

T 

1  F 

F 

T  | 

In  the  first  row  of  END.C  the  first  boolean  true  comes  in  the 
fourth  element,  therefore,  the  fourth  elestient  is  the  last  element  in  the 
first  row  of  array  C.  Similarly,  the  second  element  of  the  second  row 
of  END.C  is  true  implies  that  there  are  only  two  elements  in  the  second 
row  of  array  C. 

Examples 

Me  will  show  how  the  END  control  variable  can  be  used  to  specify  a 
varying  number  of  repetitions  by  finding  the  greatest  common  divisor  of 
two  positive  integers  M  and  N.  Euclid's  algorithm  is  used  here. 

MODULES  TEST  ; 

SOURCES  IN  ; 

TARGETS  OUT  ; 

IN  IS  FILE  (INR)  ; 

INR  IS  REC(M,N)  ; 


OOT  IS  FIUE  (OUTR)  ; 
OUTR  IS  REC(GCD)  ; 


NX  IS  GROUP  ( WKG( * ) )  > 

NKG  IS  GROUP  (WK1,WK2)  ; 

<H,N,GCD,WK1,WK2)  IS  FIELD  NUM(4)  ; 

WKl(SUBl)  -  IF  SUB1-1  THEM  M 

ELSE  IF  WK1( SUB 1-1 ) >WK2( SUB1-1 )  THEM 
WK1(  SUB1-1 )-WK2( SUB1-1 ) 

ELSE  WK2( SUB 1-1 )  ; 

WK2( SUB1 )  -  IF  SUB1-1  THEN  N 

ELSE  IF  WK1( SUB1— 1 ) > WK2( SUB1— 1 )  THEN 
WK2( SUB 1-1 ) 

ELSE  WK1( SUB 1-1 )  > 

END . WKG( SUB1 )  -  WK1(  SUB1 )-WK2(  SUB1 )  ; 

IF  END. WKG( SUB1)  THEN  GCD  -  WK1(  SUB1 )  ; 

POINTER. X 

If  X  is  a  record  of  a  keyed  input  file  F,  the  instances  of  the 
record  X  can  be  selected  and  ordered  according  to  the  value  of  a  control 
variable  POINTER. X.  The  control  variable  POINTER. X  has  the  same  number 
of  dimensions  and  the  same  shape  as  the  array  X.  For  every  value  in  the 
control  variable  po INTER. X,  a  record  instance  in  the  file  F  with  that 
key  value  will  be  presented  in  the  corresponding  element  of  array  X.  In 
order  to  use  POINTER  control  variable  for  selecting  and  ordering  the 
records  in  a  keyed  file,  one  of  the  field  in  records  should  be  declared 
as  a  key  in  the  file  declaration  statement.  The  content  of  the  POINTER 
control  variable  is  usp!  as  the  key  to  access  the  corresponding  record 
from  the  keyed  file. 

A  keyed  file  may  either  have  sequential  or  index  sequential 

organization.  If  the  file  is  index  sequential,  the  records  stored  in 

the  file  may  be  in  any  order.  However,  if  the  file  is  actually  a 
sequential  file,  then  the  records  have  to  be  sorted  in  an  ascending 
order  according  to  the  key  field  and  the  keys  used  to  access  the  records 
should  also  be  sorted  in  the  same  order.  This  is  an  implementation 

restriction.  Without  this  restriction  we  can  not  read  all  the  records 

we  want  from  that  file  in  one  pass. 

When  a  keyed  file  is  declared  as  a  source  and  a  target  file,  the 
target  file  will  be  an  updated  version  of  the  source  file.  Effectively 
only  the  records  being  selected  may  be  modified.  For  the  rest  of  the 
file  they  are  kept  intact  in  the  target  file.  This  mechanism  makes  the 
update  of  sequential  or  index  sequential  file  much  easier  to  specify. 
Since  a  key  value  may  occur  more  than  once  in  the  POINTER  array,  the 
corresponding  (one)  record  will  be  accessed,  possibly  updated,  and 
written  out  several  times.  In  order  to  make  sure  every  update  to  the 
same  record  is  effective,  the  updates  have  to  be  done  sequentially.  We 
can  envisage  that  a  new  version  of  the  keyed  file  is  created  after  one 
record  is  updated  and  every  update  is  done  on  the  most  recent  version  of 
the  file. 

Examples 
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In  the  following  MODEL  specification  a  source  file  INVEN  is 
declared  as  a  keyed  file.  STOCKS  in  the  record  INVREC  is  the  key  field 
of  INVEN  file.  Since  the  control  variable  POINTER. INVREC  is  equal  to 
the  field  STK  in  file  TRAN,  the  INVREC  records  will  be  ordered  according 
to  the  values  in  the  STK  field. 


MODULE <  MINSALE  ; 

SOURCE:  TRAN,  INVEN  ; 

TRAN  IS  PILE  ( SALEREC( * ) )  ; 

SALEREC  IS  RECORD  (  OUSTS , STK, QUANTITY )  ; 

CUSTS  IS  FIELD( CHAR(  5 ) )  ; 

STK  IS  PIELD( CHAR( 8 ) )  ; 

QUANTITY  IS  PIELD(  CHAR(  3 ) )  ; 

INVEN  IS  PILE  ( INVREC( * ) ) 

KEY  STOCKS 

ORG  ISAM  ; 

INVREC  IS  RECORD( STOCKS, SALPRICE , QOH )  ; 

STOCKS  IS  PIELD( CHAR( 8 ) )  ; 

SALPRICE  IS  PIELD( NUMERIC( 5 ) )  ; 

QOH  IS  FIELD( NUMERIC(  S ) )  ; 

POINTER. INVREC  *  TRAN. STK  ; 
gQONP.X 

If  X  is  a  record  in  a  keyed  file,  then  it  is  accessed  through  the 
value  of  a  pointer  control  variable.  It  may  happen  that  the  key  value 
used  to  access  the  record  does  not  match  with  any  record.  The  accessing 
would  fail.  The  user  may  test  the  value  in  a  control  variable  called 
POUND. X  to  find  out  whether  a  record  with  some  specific  key  exists  or 
not.  This  informaton  may  be  used  to  decide  whether  a  new  record  should 
be  added  into  the  file  or  an  old  record  should  be  updated.  The  control 
variable  found. X  has  the  same  shape  as  array  X  and  pointer. x.  Its  data 
type  is  boolean. 

LEN.X 

If  X  is  a  field  in  some  record  and  its  data  type  is  variable  length 
character  string,  then  the  actual  length  of  X  is  specified  by  the 
control  variable  LEN.X  which  is  used  to  disassemble  the  input  or  output 
records.  Corresponding  to  every  element  of  array  X,  there  is  an  element 
in  LEN.X.  The  values  in  the  array  LEN.X  axe  integers,  we  can  use  any 
integer  type  expression  to  define  LEN.X.  The  only  restriction  is  that 
the  content  of  LEN.X  should  not  depend  upon  any  data  physically 
positioned  in  a  record  after  the  data  field  X. 

NEXT.X 

If  X  is  a  field  in  an  input  sequential  file,  the  control  variable 
NEXT.X  can  be  used  to  denote  the  same  field  in  the  next  physical  record 
of  the  file.  Although  the  next  record  usually  means  the  record  with  a 
subscript  value  one  larger  than  the  current  record,  it  may  not  be  true 
when  the  current  record  is  the  last  record  in  some  group.  The  problem 


la  caused  by  the  fact  that  tha  usar  la  dealing  with  atructurad  data  but 
tha  real  data  in  tha  external  fila  ia  in  a  linear  fora.  Soaatiaoa  tha 
information  uaad  to  tranaform  a  aaquanca  of  racorda  into  a  atructurad 
fora  can  only  be  conveniently  expreaaed  in  tha  way  that  tha  racorda  are 
physically  contiguoua.  For  example,  we  may  want  to  coeipare  tha  value  of 
a  key  field  in  two  adjacent  racorda  to  determine  whether  a  record  ia  the 
laat  record  in  a  group  or  not.  The  fact  that  the  current  record  and  the 
next  record  may  or  may  not  be  in  the  aame  group  cauaea  trouble  in 
referencing  the  next  record. 

Example: 

Suppoae  the  racorda  in  a  transaction  file  contain  a  customer  number 
and  some  relevant  information  and  the  records  are  sorted  according  to 
the  value  of  the  customer  number  field.  We  may  use  the  following 
specification  to  describe  the  data  structure. 


TRANSACTION  IS  FILE  ( CUSTOMER(  * ) )  ; 

CUSTOMER  IS  GROUP  ( TRANS_REC( * ) )  > 

TRANS  JREC  IS  RECORD  (CUSTOHJHO,  INFORMATION)  ; 
CUSTOMERJNO  IS  FIEID  ( PIC' 99999999* )  ; 

I  IS  SUBSCRIPT  ; 

J  IS  SUBSCRIPT  ; 

END . TRANS_REC( I, J)  - 

CUSTOMER_NO(I,J)"«NEXT.CUSTOMERu_NO(I,J)  ; 


The  term  NEXT.CUSTOMER_NO(  I,J)  in  the  last  assertion  can  not  be 
replaced  by  CUSTOMER_NO(  I ,  J+l )  because  there  may  not  be  a  record  with 
this  pair  of  subscript  values.  The  restriction  in  using  the  control 
variable  NEXT.X  is  that  the  position  of  X  field  in  a  record  should  be 
fixed,  i.e.  the  fields  to  the  left  of  the  field  x  can  not  be  variable 
length  strings  or  repeating  with  a  variable  number  of  times,  otherwise, 
the  field  X  in  the  next  record  may  not  be  located  correctly. 


SUBSET. X 


If  X  is  a  record  in  an  output  file,  then  the  control  variable 
SUBSET. X  can  be  used  to  selectively  omit  some  records  from  an  output 
file.  The  SUBSET. X  control  variable  is  a  boolean  array  of  the  same 
shape  as  the  array  X—  When  an  element  in  the  SUBSET. X  has  a  value  of 
boolean  true,  the  corresponding  record  X  will  be  put  into  the  output 
file.  On  the  other  hand,  if  the  element  has  a  value  of  boolean  false, 
the  corresponding  record  will  not  be  put  into  the  output  file.  It 
should  be  noted  that  the  use  of  SUBSET  control  variable  does  not  affect 
any  other  computations.  Only  a  subset  of  records  X  may  be  omitted  from 
the  output  file. 
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CHAPTER  3 

SYNTAX  ANALYSIS  PROGRAM 


The  first  phase  of  the  MODEL  processor  analyses  the  syntax  and 
other  local  semantics  of  individual  statements.  Advanced 
state-of-the-art  syntax  analysis  techniques  are  used  here  which  have 
proved  to  be  invaluable.  Specifically,  the  capability  to  generate  the 
parser  automatically  has  enabled  rapid  development  changes.  In  addition 
to  checking  the  MODEL  statements  for  syntactic  and  some  semantic  errors, 
this  phase  also  stores  the  statements  in  an  internal  associative  form 
for  later  processing. 


3.1  EBNF,  SAPG,  AND  THE  SAP 

3.1.1  SPECIFICATION  OP  MODEL  USING  EBNF  AND  THE  SAPG 

The  syntax  Analysis  program  (SAP)  for  the  MODEL  statements  is 
generated  automatically  by  a  Syntax  Analysis  Program  Generator  (SAPG). 
As  shown  in  Figure  3.1,  the  SAPG  produces  the  Syntax  Analysis  Program 
(SAP)  for  analyzing  MODEL  statements,  baaed  on  a  specification  of  the 
MODEL  language  expressed  in  the  EBMP/NSC  (extended  Backus  Normal  Form 
With  Subroutine  Calls)  meta  language. 
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Figure  3.1  Block  Diagram  of  SAPG  and  SAP 

The  EBNF/WSC  includes  the  traditional  concepts  of  BNF .  BNP  uses 
sequences  of  characters  enclosed  in  angle-brackets  <  >  called 
non-terminala  to  give  names  to  grammatical  units,  for  Which 
substitutions  may  be  made.  It  also  uses  sequences  of  characters  not 
enclosed  in  brackets  Which  are  in  the  object  language  ( in  this  case 
MODEL).  BNF  consists  of  a  series  of  production  rules  or  substitution 
rules  of  the  form  "As :«B"  Where  "A"  is  a  single  non-terminal  symbol  and 
"B"  is  one  or  more  alternative  sequences  of  terminal  or  non-terminal 
symbols  that  can  be  substituted  for  A.  The  alternatives  are  separated 
by  the  meta-symbol  " | " .  To  facilitate  language  description,  BNF  was 
extended  to  E8NF  with  two  more  well-known  meta-symbols  >  [  ] 
representing  optionality  and  [  ]*  representing  zero  or  more  repetitions. 

The  specification  of  MODEL  that  is  input  to  the  SAPG  consists  not 
only  of  the  syntax  specification  of  MODEL,  but  also  of  subroutine  names 
embedded  within  the  EBNP;  therefore  the  name  ”EBNF  With  Subroutine 
Calls"  (EBNF/WSC).  The  SAPG  provides  a  capability  to  branch  to  these 
subroutines  upon  successful  recognition  of  a  syntactic  unit.  Thus,  they 
can  complete  the  SAP  to  enable  it  to  check  some  of  the  statement 
semantics,  to  encode,  to  produce  error  messages,  and  to  store  the  MODEL 
statements  for  later  retrieval.  The  invocations  of  these  subroutines 
themselves  are  written  manually.  The  definition  of  the  MODEL  language 
in  EBNF/WSC  appears  in  Figure  3.2.  The  subroutines  to  be  invoked  are 
indicated  between  slaches  (/.../).  Note  that  subroutine  calls  are  made 
after  the  successful  recognition  of  syntactic  units  up  to  that  point. 
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Th«  SAP  generated  by  the  SAPG  according  to  the  EBNP/MSC  la 
supplemented  and  linked  with  the  routines.  The  SAP  accepts  statements 
in  MODEL  and  checks  them  for  syntactic  correctness,  and  local  semantics. 
Zt  produces  a  listing  of  the  statements,  syntax  diagnostics,  an  encoded 
stored  version  of  the  MODEL  statements,  syntax  trees  for  the  assertions 
and  a  cross-reference  report. 
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<MDDEL_SPECIFICATION >  :  :-[  <  MO  DEL_BODY_ STMTS)  /CLRERRP/  ]• 

/STVr_PV  <MDDEL_SPECIPICATION> 
<MODEL_BODY_STMrs  )  1 1  -  /E(  80  )/ 

MODULE  <  MODULE_NAME_STNT  > 

|  SOURCE  <SOURCE_PILES_STMT> 

| TARGET  <TARGET_FILES_STMr> 
t  •  _END  •  /ENDINP/ 

|  <  DCL_DESCRIPTION  > 

|  <BLOCK_BEGIN> 

|  <BLOCR_END> 

|  <OLD_FILE_STMT) 

|  /ASS IN IT/  < ASSERTIONS >  /STRHS/ 

«. DCL_DESCRIPTION >  t  i-  1  /ZNTDCL/  /INTMVAR/  /MEMINIT/  /SVMEK/ 

<DATA_SPEC> 

C,  /E( 108 )/  < INTEGER)  /CRDCL/ 

/INTMVAR/  /MEMINIT/  /SVMEJV 
<DATA_SPEC>  ]*  /STDCV  <ENDCHAR> 
<OATA_SPEC>  i <DCL_MVAR>  [<  <OCCSPEC>  )]  [  <IS>  ] 
<ATTR_SPEC>  /SVDCV 

<ATTR_SPEC>  : <PILE>  /SVP/  /SVPLNH/  <PILE_DESC> 
<STORAGE_DESC>  /STDEV/ 

|  < RECORD)  /SVR/ 

|  <PIEID_STMT)  /STDPLD/  /SVD/ 

|  [<GROUP>]  /SVG/ 

<BLOCX_BEGIN>  : i-  BLOCK  /BUCZNZT/  [  <NAME>  /SVLBL/  ]  /E( 2 )/ 
i  C  <  BLOCK_Si?EC  >  ]*  /SVBLOK/  <ENDCHAR> 

< BLOCK_SPEC >  : !-  < SOLUTION)  |  < ITERATION)  |  <REL_ERROR> 

< SOLUTION)  s i-  [  SOLUTION  ]  METHOD  [  <IS>  ]  /E( 62 )/ 

< METHODS)  /SVMETH/  [  ,  ] 

< METHODS)  i !-  NEWTON  |  GAUSS_SEIDEL  |  G_S  |  JACOBI 

< ITERATION)  it-  [  < MAXIMUM)  ]  <ITER)  [  <IS>  ]  /E(4)/ 

< NUMBER)  /SVITER/  [  ,  ] 

< MAXIMUM)  ii-  MAX  |  MAXIMUM 

< ITER)  ii-  ITER  |  ITERATION  |  ITERATIONS 

<REL_ERROR)  It-  [  RELATIVE  ]  < ERROR)  [  <IS)  ]  /E( 5 )/ 

< NUMBER)  /SVERR/  C  •  ] 

< ERROR)  it-  ERR  |  ERROR 

<BLOCK_END>  it-  <END>  /BLKEND/  [  <NAME>  /CHKLBL/  ]  <ENDCHAR> 
<END)  i t—  /END ID/ 

< ASSERTIONS) i i-/E( 14)/<C0NDITI0NAL>  f 

/SVASSR/  /INTMVAR/  «MVAR)  /STMVAR/  /SVCMP1/ 

[ «IS)/SVNXOP/] <  DDL_OR_RHS > 

< CONDITIONAL) i i-IP  /SVAAS1/  /SV0P1/  /SETBIT/  /E( 18  )/ 

<BOOLEAN_ EXPRESS ION)  /SVCMP1/  /E( 38 )/ 

THEN  /SVNXDP/  < SIMPLE .ASSERTION)  /SVNXCMP/ 
[ELSE  /SVNXOP/  < ASSERTION)  /SVNXCMP/]  /STALL/ 

< ASSERTION) : /E< 14 )/  < CONDITIONAL)  |  <SIMPLE_ASSERTION) 

Figure  3.2  Definition  of  MODEL  language  in  EBNP/NSC 


25  - 


<  DDL_OR_RHS  >  1 1  -/ INTOOO V  <OATA_DESC_STMT>  /FREETMP/ 

|  /E( 33 )/  <INTOAS>  <ASSERTION_BRANCH> 
<ENDCHAR> 

<ASSERTION_BRANCH>  > «-  <DEF_EXPRESSION> 

|  <BOOLEAN_EXPRESSXON>/SVNXCMP/  /STALV 
<DEF_EXFRESSION> t i»  /INTSUB/  {  <VALUE_LIST>  }  / FREES UB/ 
<VALUE_LIST> i (  /CRSUB/  /DECPP/  <VALUE_LIST> 

[,  <VALUE_LIST>  ]*  )  /INCPP/ 

|  [<SXGN>  /SVDPP/]  < NUMBER >  /STNUM/  /STASS/ 

< INTOAS> i i «/ INTOASS/ 

<SIMPLE_ASSERTION> i /SVASAE1/  /XNTMVAR/  <MVAR>  /STMVAR/ 

/SVCMP1/  /E( 23 )/  -  /SVNXDP/ 
<BOOLEAN_EXPRESSION>  /SVNXCMP/  /STALL/ 
<ENDCHAR> 

<SUB_VARIABLE> s s-  /SETSUBV/  <VAR>  /SVCMP1/ 

[(/SVNXDP/  /SET'BIT/  /E(  22  )/ 

<BOOLEAN_EXPRESSION>  /SVNXCMP/  [, /SVNXDP/ 

< BOOLEAN_EXPRESS XON > /SVNXCMP/ ] * 

/«<24)/  )  ]  /STALL/ 

<BOOLEAN_EXPRESSION> i !«  /E( 82 )/  /SVBEXP/  <COND_EXP> 

|  <BOOLEAN_TERM>  /SVCMP1/ 

[ <OR>  /SVNXDP/  <  BOOLEAN_TERM> 
/SVNXCMP/]*  /STALL/ 

<COND_EXP>  t l-  IF  /SVCOND/  /E( 3 )/  <BOOLEAN_EXPRESSION> 
/SVCMP1/  /E( 79 )/  THEN  /SVNXDP/ 

<  BOOLEAN_EXPRES5 ION  >  /SVNXCMP/  /E( 12  )/  ELSE 
/SVNXDP/  <BOOLEAN_EXPRESSION>  /SVNXCMP/ 
/STALL/ 

<OR>  s  t«  /OR_REC/ 

<BOOLEAN_TERM> : :■  /E(83)/  /SVBT1/  <BOOLEAN_FACTOR>  /SVCMP1/ 
[_/ SVNXDP/  < BOOLEAN_FACTOR>  /SVNXCMP/]* 
/STALL/ 

<BOOLEAN_FACTOR> i s-  /E( 82 )/  /SVBFI/  < CONCATENATION >  /SVCMP1/ 

[ < RELATION)  /SVNXDP/  < CONCATENATION > 
/SVNXCMP/]*  /STALV 
< RELATION > i s-  /RELREC/ 

< CONCATENATION > j :■  /E(84)/  /SVCON/  <ARITH_EXP>  /SVCMP1/ 

[  <CONCAT>  /SVNXDP/  <ARITHUEXP> 
/SVNXCMP/]*  /STALV 

<CONCAT> i i-  /CATREC/ 

<ARITH_EXP> : /E(81)/  /SVAE/  [«SIGN>  /SVDP1/] 

<TERM>  /SVCMP1/  [<OPS>  /SVNXDP/  <TERM> 
/SVNXCMP/]*  /STALV 

<TERM> i i-  /E( 87 )/  /SVTERM/  <FACTOR>  /SVCMPX/ 

[ <MOPS>  /SVNXDP/  «FACTOR>  /SVNXCMP/]*  /STALV 
<FACTOR>st-  /E( 85 )/  /SVFAC/  [  /SVDP1/]  < PRIMARY >  /SVCMP1/ 

[ <EXPON>  /SVNXDP/  < PRIMARY >  /SVNXCMP/]*  /STALV 
<EXPON> i I-  /EXPREC/ 
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< PRIMARY) 1 1-  /E( 86  )/  /SVPRIM/  < IS_PRIM>  /SVCMP1/  /STALL/ 

<IS _PRIM> ! i-  (  < BOOLEAN_EXPRESSION >  /E( 24 )/  ) 

|  < NUMBER)  /STNUM/  |  <STRING_PORM> 

|  <  PUNCTION_CALL  >  |  <SUB_VARIABLE> 

<STRING_EORM> : : -  *  /SETSTRN/  [  < STRING)  /SVSTRNG/]  /E( 26 )/ 

•  /AD LEX/  [B  /STB IT/  /E(l)/  <B_SUFX>] 

/STNUM/ 

<PUNCTION_CALL> : i-  <FUNCTION_NAME>  /STPUN/ 

/SETFUNC/  [ ( /SVNXDP/  <BOOLEAN_EXPRESSION> 
/SVNXCMP/  [, /SVNXDP/  <BOOLEAN_EXPRESSION> 
/SVNXCMP/  ]*  )  ]  /STALV 
<FUNCTION_NAME> t <-  /PNCHECK/ 

<MVAR> : (  <SUB_VARIABLE>  /SVMVAR/ 

[,  <SUB_ VARIABLE)  /SVMVAR/  ]*  ) 

|  <SUB_ VARIABLE)  /SVMVAR/ 

<VAR>  :  s-  /SETVAR/  /INZTQNM/  /E(68)/  <NAME>  /ADLEX/  /MKQNM/ 

C  .  /ADLEX/  /E(  68 )/  <NAME>  /ADLEX/  /MKQNM/]* 

/STR_CON/ 

<DCL_MVAR>  ! (  <VAR>  /SVMVAR/  [,  <VAR>  /SVMVAR/  ]*  ) 

|  <VAR>  /SVMVAR/ 

<B_SUFX> : /BITSTR/ 

<QNAME>  :  /INITQNM/  /E(  68  >/  <MAME>  /MKQNM/ 

[  .  /E( 68 )/  <NAME>  /MKQNM/  ]  * 

< STRING) i t-  <STRING_CONST> 

<OPS) : i-  /OPREC/ 

<MDPS> : /MOPREC/ 

<TEST> : /TESTBIT/ 

<MODULE_NAME_STMT> : /E(63)/i  /E( 64)/  <NAME)  /STOOD/ 

<EMDCHAR> 

<SOURCE_FILES_STMT> : [ <FILE_KEYWORD> ]  /E( 75 )/  /IN ITS FI/  : 

<SOURCE_FILELIST>  /STSRC/  <ENDCHAR> 
<FILE_KEYWDRD> ! !-  FILES | FILE 
<SOURCE_FILELIST> ( i-  /E( 76 )/  <NAME>  /SVSRC/ 

C,  /E( 76  )/  <NAME>  /SVSRC/]* 

<TARGET_FILES_STMT> ! !-  [ <FILE_KEYWDRD> ]  /E(77)/  /INITTFL/  : 

<TARGET_PILELIST>  /STTAR/  <ENDCHAR> 
<TARGET_FILELIST) i !-  /E(78)/  <NAME>  /SVTAR/ 

[,  /E( 78 )/  <NAME>  /SVTAR/  ]* 

<DATA_DESC_STMT> : <  DATA^DESCRIPTION  >  <ENDCHAR> 

<  DATA_DESCRIPTION > : :» 

<FILE_STMT>  /STFILE/ 

| <RECORD_STMT>  /STREC/ 

| <GROUP_STKT>  /STGRP/ 

| <FIELD_STMT>  /STFLD/ 

| <SUB_STMT>  /STSUBST/ 

<SUB_STMT> i i-<SUBSCRIPT>/MEMINIT/  /SVMEM/  [(  < OCCS PEC)  )] 

< SUBSCRIPT) i !-  SUB  |  SUBSCRIPT  |  SUBSCRIPTS 
<FILE) i !-  PILE  |  REPORT  |  PILES  |  REPORTS 

Figure  3.2  Definition  of  MODEL  language  in  EBNF/W5C 
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<R£CORD_ST*er>  1 1-  < RECORD >  /MOON IT/  [(  ]  <ITEH_LIST>  [)] 

< RECORD)  it-  REC  |  RECORD  |  RECORDS 
<ITEM_LIST)  1 1  —  /E( 52 )/<ITEM>  [[,]  <ITEM>]* 

<XTEM> i t—  <NAME)  /SVMEM  /  [  .  <NAME)  /SVMEM/  ]* 

[(  <OCCSPEC)  )] 

< OCCSPEC > i i-  <STAR>  /SV STAR/  |  <MINOCC>/SVMIOC/  [ <MAXDCC> ] 
<STAR> i t-  /STARREC/ 

<MINOCC> i i -< INTEGER) 

<MAXDCC)  it-  [i/E< 51 )/]< INTEGER)  /SVMXDC/  /OOWMX/ 

|  < INTEGER)  /SVMXDC/  /CXMNMX/ 

<GROCJP_STMT> i :«  < GROUP >/KEMINIT/  [( ]  <ITEM_LIST>  [ )] 

< GROUP)  it-  GRP  |  GROUP  |  GROUPS 
<PIELD_STMtT>  1 1  —  < FIELD)  /SVFLD/  < FIELD _JUTR> 

<PIELD>  ii-  FLD  |  FIELD  |  FIELDS 

< FIELD _ATTR> : i-  [( ]  <TYPE>  /SVPDTP2/[  <LENG_SPEC> ] 

c , 3  [<LINE_SPEC>]  [,]  [ <COL_SPEC> ]  [  )] 

<  LENG_SPEC  >  is-  (  /E( 48 )/  <MIN_LENGTH>  [  <MAX_LENGTH>  ] 

/E(49)/  ) 

| <MIN_LENGTH>  [ <MAX_LENGTH> ] 

<MIN_LENGTH>  :  :-  <  INTEGER)  /SVWJFIN/ 

<LINE_SPEC> : :-  LINE  /E( 53 )/  /E<54)/  /E( 55 )/ 

(< INTEGER)  /SVLINE/) 

<COL_SPEC> i :—  COL  /E( 90 )/  /E(91>/  /E( 92 )/ 

(  < INTEGER)  /SVCOL/  ) 

<TYPE> i t—  /E( 47 )/  <PIC_DESC>  |  <STRING_SPEC)  |  <NUM_SPEC> 
<PIC_DESC> i i—  <PIC_TYPE)  /E(67)/  /SV PIC/ 

•  [  < STRING)  /SVPICST/  ]  •  /STPIC/ 

<PIC_TYPE> i t-  PIC  |  PICTURE 
<STRING_SPEC)t i-  <STRING_TYPE>  /SVSTRTP/ 

<STRING_TYPE) s :—  CHAR  |  CHARACTER  |  BIT  |  NUM  |  NUMERIC 
<NUM_SPEC) t i-  <NUH_TYPE>  /SVNUMTP/  [  <FIXFLT>  /SVMOD/  ] 
<NUM_TEPE>s :-  BIN  |  BINARY  |  DEC  |  DECIMAL 
<FIXFLT> i FIX  |  FIXED  |  FL  |  FLOAT  |  FLT 
<MAX_LENGfTH>  s  s—  [s]  <  INTEGER)  /SVMXFIN/ 

|  ,  /E( 46 )/  «SINTGR)  /SVSCALE/ 

|  < INTEGER)  /SVMXFIN/ 

<SINTGR)tt-  -  /E( 50 )/  < INTEGER)  /NEGATE/  |  < INTEGER) 

< NUMBER)  it-  /SETNUM/  <INITNUM)  /E( 65 )/  <RECNUM) 

<RECNUM> i t-  /RECNUM/ 

<  INITNUM)  :  t-  /INITNWV 

<SIGN>  1 1—  |  - 

<RECG) t i—  < RECORD)  |  < GROUP) 

<KEY) i i -KEY | SEQUENCE 
<CODE) i t -EBCDIC | BCD | ASCII 
<ANY) t i—  «NAME) | < INTEGER) 

<N0_TRKS)t i-  7 | 9 

< DENSITY) i i—  200 | 556 | 800 | 1600 | 6250 
< PARITY) t ODD | EVEN 

Figure  3.2  Definition  of  MODEL  language  in  EBNF/WSC 
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<TYPEDSK)u-  231*  1 2311 1 3330 1 2305  |  3330-1 
<ORG>  t  t-ORG| ORGANIZATION 

<ORG_TYPE> t /E( 7 )/ ISAM | SEQUENTIAL | SAM | INDEXED_SEQUENTIAL 
<ENDCKAR>  s  :**  /E(  74  )/  <END_CHAR>  /STOTINC/ 

<END_CHAR> : !-  /SVENDC/ 

<STRING_CONST> : : -/CHARSTR/ 

<NAME> < : -/NAMEREC/ 

< INTEGER) ! J-/INTREC/ 

<ISxt*  IS  |  -  |  ARE 

<FIIiE_STMT> ! i-  <FILE>  /SVFLNM/  /MEKINIT/  < SON_DESC > 
<FILE_DESC>  <STORAGE_DESC>  /STDEV/ 

<  SON_DESC  > : <ITEK_LIST)  ) 

|  <RECG>  [NAME]  [<IS>]  [(]  <ITEM>  [ )] 
<OU3_riULSTMT>  i  s-  <PIUS>  [NAME]  t<IS)]  /E(56)/  /MEHINIT/ 

/INTHVAR/ 

<DCI(_MVAR»  /SVFUW/ 

<RECG>  [NAME]  [<IS>]  [( ]  <ITEM>  [)] 
<FILE_DESC)  /STPILE/ 

<STORAGE_DESC>  /STDEV/  <ENDCHAR> 
<PILE_DESC> it-  C  STORAGE  [NAME]  [<IS>]  /E( 44 )/  <NAME> 
/SVSTNH/] 

[ <KEY>  [NAME]  [<IS>]  /E( 45 )/  <NAME>  /SVKEY/] 
[<0RG>  [<IS>]  <ORG_TYPE >  /SVORG3/] 
<STORAGE_DESC>  J i-  [DEVICE  [ <IS>]  < DEVICE)]  /SVDEV/ 

[RECORD  /E( 57 )/]( FORMAT  [< IS > ]  <REC_FMT) ]/SVRECP/ 
<BLA_REC_VOL.» 

[ <TAPE_DESC> ]  [<DISX_DESC)] 

[HARDWARE]  [SOFTWARE] 

< DEVICE)  ii«  /E( 61 )/  TAPE  |  DISK/SETDEVB/ 

|  CARD  /SETDEVC/  |  PRINTER  /SETDEVP/ 

|  PUNCH  /SETDEVU/  |  TERMINAL  /SETDEVT/ 

<REC_PMT>  : :»  /E( 69 )/  FIXED | VARIABLE | VAR.SPANNED | UNDEFINED 
<BLK_REC_VOL>  j i- 

[  [MAX]  /E( 70 )/  /E( 71 )/  BLOCKS I ZE  [<IS>] 

< INTEGER)  /SVBLK/  ] 

[  [MAX/E( 59 )/]  RECORDS I ZE  [«IS»]  /E(72)/ 

<  INTEGER>/SVRCSZ/  ] 

[  VOLUME  [NAME]  [<IS>]  /E( 60 )/  <NAME> 
/SWOI/  [,/E(60)/<NAME>]*  ] 

<TAPE_DESC>  : [<TRACXS>  [<IS>]  /E( 66  )/ <N0_TRKS)/SVTRK2/  ] 
[PARITY  [<IS»]  /E( 66 )/  < PARITY >/SVPAR2/] 
[DENSITY  [<IS>]  /E( 66 )/  < DENSITY)  /SVDEN2/ ] 

[  [TAPE]  LABEL  [<IS>]  < LABEL_TYPE > /SVLAB2/ ] 
[START  [FILE]  [«IS>]  /E(66)/  « INTEGER) 
/SVSTFL2/] 

[[CHAR]  CODE  [<IS>]  <CODE)  /SVCC/  ] 

< TRACKS)  t  NO.TRKS  |  TRACKS 

<LABEL_TYFE)  » i-  /E( 58)/  IBM_STD| ANSI_STD| NONE | BYPASS 

Figure  3.2  Definition  of  MODEL  language  in  EBNF/WSC 
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<DISX_DESC>  i i-  [UNIT  [<IS>]  /E( 9 )/  <TYPEDSK>  /SVUNIT2/] 

[<CYLINDERS>/SVUCYV  [<IS>]  /E(66)/ 

< INTEGER)  /SVQTY2/] 

< CYLINDERS)  i NO_CYLS  |  CYLINDERS 
< HARDWARE) : [[COMPUTER]  MODEL  [ <IS> ]  <ANY> 

< SOFTWARE) ! [[OPERATING]  SYSTEM  [<IS>]  <ANY> ] 

Figure  3.2  Definition  of  MODEL  language  in  EBNF/WSC 


3.1.2  HOW  THE  SAPG  PRODUCES  THE  SAP 

The  SAPG  is  a  parser  generator.  It  accepts  a  specification  in  the 
language  EBNF/WSC  and  produces  a  parser  program  (SAP).  It  performs  this 
in  three  puses  over  the  set  of  productions. 

In  pass  1,  each  production  is  scanned,  and  its  components  are 
encoded  into  a  set  of  tables.  Non-terminal  symbols  appearing  on  the 
left-hand-side  of  a  production  ( new  production  names )  are  put  into  a 
symbol  table  ( LHS-NT-SYM-TAB ) ,  while  non-terminals  appearing  on  the 
right-hand-side  of  a  production  are  put  into  another  symbol  table 
( RHS— NT-SYM-TAB ) .  Terminal  symbols  in  a  production  are  put  into  a 
terminal  symbol  table  ( TERM-SYM-TAB ) .  Subroutine  calls  axe  put  into  yet 
another  table  (SUB-TAB). 

In  pass  2,  the  symbolic  references  in  RHS-NT-SYM-TAB  (i.e. 
non-terminals  on  the  right-hand-side  of  the  original  production)  are 
resolved.  Pass  2  checks  that  each  non-terminal  symbol  in  RHS-NT-SYM-TAB 
is  defined,  and  links  it  to  the  corresponding  entry  in  LHS-NT-SYM-TAB . 
undefined  non-terminals  as  well  as  circularly-defined  non-terminals  can 
be  detected  in  these  table  searches. 

Pass  3  of  the  SAPG  is  the  code-generation  phase  that  produces  the 
SAP  in  PL/I-  It  is  only  entered  if  no  errors  were  encountered  in  the 
previous  phases.  For  each  EBNF/WSC  production,  a  PL/I  procedure  is 
generated.  Each  one  returns  a  bit:  1  if  the  recognition  was 
successful;  o  if  it  was  unsuccessful.  The  exclusive  nature  of  EBNF 
production  rules  and  alternatives  is  effected  by  generating  nested  PI/I 
IF-THEN-ELSE  statements.  Repetition  zero  or  more  times  is  effected  by 
generating  a  GO  TO  to  the  statement  testing  for  recognition.  Subroutine 
names  embedded  in  the  EBNF/WSC  get  a  CALL  generated  for  them  in  place. 
Calls  to  other  subroutines  not  explicit  in  the  EBNF/WSC  are  also 
generated.  These  include  "housekeeping"  subroutines  of  the  SAP  and 
calls  to  LEX,  a  subroutine  to  scan  and  return  the  next  token  in  the 
abject  language. 

To  illustrate  the  code  that  the  SAPG  generates,  consider  the 
following  representative  production  rule  in  the  EBNF/WSC  and  the  PL/I 
code  that  corresponds: 

«FIELD_STMT>: <FIELD>  /SVFLD/  < F IELD_ATTR>  /STFtD/ 

The  PL/I  code  that  is  generated  for  it  by  the  third  pus  of  the  SAPG 
would  be  the  following: 


1 


FIELD_ST**r«  PROCEDURE  RETURN j( BIT( 1 ) ) ; 

CALL  3MAR1C; 

IP  PIELD( )  THEN  DO; 

IP  ERROR5W  THEN  DO;  CALL  3SUCCES;  RETURN (  ' 1 ' B ) ;  END;  ELSE; 
CALL  SVPLD; 

IP  FIELD_ATTR( )  THEN  DO; 

IP  ERRORSW  THEN  DO;  CALL  3SUCCES;  RETURN( ’l'B);  END;  ELSE; 
CALL  STFLD; 

CALL  9SUCCES;  RETURN( 'l'B); 

END;  ELSE  DO;  CALL  3SUCCES;  RETURN ( 'l'B);  END; 

END;  ELSE  DO;  CALL  9PAIL;  RETURN ( ’O'B) j  END; 

END  FIELD-STMT; 


The  above  code  generated  by  the  SAPG  would  become  one  procedure  in 
the  SAP .  Note  that  the  name  that  the  language  definer  uses  in  the 
production  rule  are  preserved  in  the  generated  SAP  code.  The 
subroutines  beginning  with  dollar  signs  ( 9 )  are  "housekeeping”  routines 
that  are  internal  to  the  mechanisms  of  SAPG-generated  code. 


3.2  SUPPORTING  SUBROUTINES  FOR  EBNF  OF  MODEL 

A  refined  system  flowchart  of  the  SAPG  and  SAP  showing  the  types  of 
supporting  routines  appears  in  Figure  3.3. 
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(3)  error  message  handling  routines; 

(4)  encoding  routines  to  compact  information  for  further  efficient 
processing;  and 

(5)  statement  storage  routines. 

the  cross-reference  report  produced  during  this  phase  is  generated 
by  a  manually-written  program  ( XREF )  and  is  described  in  section  3.4. 

A  discussion  on  how  to  decide  where  to  insert  subroutines  as  well 
as  a  tabular  summary  of  all  routines  used  appears  in  section  3.2. 
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3.2.1  THE  LEXICAL  ANALYZER 

the  purpose  of  the  lexical  analyzer  is  to  scan  for  syntactic  units 
or  "tokens”,  using  such  delimiters  as  blanks  and  certain  punctuation 
marks,  and  to  return  tokens  to  the  syntax  Analysis  Program  (SAP)  for 
syntactic  checking.  The  automatically-generated  SAP  calls  upon  the 
lexical  analyzer  (LEX)  whenever  it  needs  the  next  token.  The  lexical 
analyzer  is  based  on  the  finite  state  machine  concept.  Each  state  of 
the  machine  corresponds  to  a  condition  in  the  lexical  processing  of  a 
character  string.  At  each  state,  a  Character  is  read,  an  action  is 
taken  based  on  the  character  read  (such  as  concatenating  the  current 
character  to  previous  ones  or  returning  the  entire  token  to  the  SAP), 
and  the  machine  changes  to  a  new  state.  The  character  classes  for  the 
MODEL  language,  for  the  purposes  of  lexical  analysis,  appear  in  Table 
3.1.  These  classes  divide  the  entire  character  set  into  categories  such 
as  illegal  characters,  delimiters,  "normal"  characters,  ...  etc.  The 
state  transition  matrix  for  the  MODEL  language  appears  in  Table  3.2. 
The  rows  of  the  matrix  represent  the  character  classes  of  the  previous 
character,  while  the  columns  represent  those  of  the  current  character . 
The  entries  in  the  matrix  indicate  the  action  to  be  taken  and  the  next 
state.  The  action  taken  in  each  state  is  summarized  in  Table  3.3.  The 
actions  involve  such  steps  as  concatenating  of  a  character,  ignoring  a 
Character,  detecting  an  illegal  character,  returning  a  complete  token  to 
the  SAP,  ...  etc.,  and  setting  a  "next  state”. 


Character  Set 
A  B  ...  T  Z  _  II 
space 

012  ...  9 
.<+_)>,%«• 


I 


/ 


all  others 


Explanation 
Characters  in  names 
Delimiter 

Numerals 

Delimeters 

Delimeter  in  logical  exp 
"OR"  symbol 

Multi,  or  consent  in  "/*" 
"NOT”  symbol 
minus  symbol 
Division  or  consent 
Delimeter  in  logical  exp 
Delimeter  and  logical  exp 
Illegal 


Table  3.1  Character  Classes  for  MODEL  Language 


Character 

1 

1 

1 

1 

Class  (next) 

0 

X 

2 

3 

4 

5 

6 

7 

8 

9 

0 

1 

2 

3 

( current ) 

0 

1 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

7 

1 

1 

3 

1 

S 

1 

1 

1 

1 

1 

1 

1 

1 

1 

7 

2 

1 

2 

1 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

7 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

7 

4 

2 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

7 

5 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

7 

6 

2 

2 

2 

2 

2 

2 

1 

2 

2 

2 

2 

2 

2 

7 

7 

2 

2 

2 

2 

2 

2 

2 

1 

2 

2 

2 

2 

2 
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Table  3.2  State  Transition  Matrix  for  MODEL  Lexical  Analyzer 
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Action  1: 
Action  2  s 
Action  3s 
Action  4s 
Action  5s 
Action  6s 
Action  7s 


Concatenate  next  character  to  current  token 
End  word  with  next  character 
Skip*  blanks  sequence 
Reserved  (not  used) 

Scan  forward  one  character  and  save  as  token 
CoBtnent  bracket)  Scan  to  end  of  comment 
Illegal  characters );  print  error  message 


Table  3.3  Lexical  Analysis  Actions 


3.2.2  STATEMENT  SEMANTICS  ANALYSIS 


Some  of  the  semantics  of  the  specification  statements  can  be 
checked  during  the  syntax  analysis  phase.  Such  routines  can  check  that 
a  range  or  condition  on  a  syntactic  unit  is  locally  correct.  These 
routines  do  not  and  cannot  check  the  overall  consistency,  completeness, 
or  correctness  of  the  logic  of  the  MODEL  specification,  a  task  Which  is 
performed  by  a  later  phase  of  the  Processor.  An  example  of  a  local 
semantics  checking  routine  is  one  which  checks  the  range  of  a  numeric 
computation.  For  instance,  if  a  group  of  data  is  said  to  occur  n  to  m 
times,  a  subroutine  exists  to  check  the  condition  0  <-  n  <  m  <32768. 
These  manually^written  routines  are  invoked  automatically  by  the  SAP  by 
virtue  of  their  specification  in  the  EBNF/WSC  of  the  MODEL  language  for 
the  SAPG.  The  semantic  checking  routines  are  listed  in  Table  3.4. 


Semantics  Cheching  Routines 
NAME  WHAT  IT  DOES 

ASS IN IT  Initializes  number  of  sources/targets  to 

assertion 


CATREC 

BITSTR 

CKMNMX 

EXPREC 

FNCHECK 

INITQNK 


Recognize  the  operator  ’ll’ 

Check  that  an  alleged  bit  string  contains 
only  the  digits  0  and  1 

Checks  proper  range  for  mininum  and  maximum 
Recognizes  the  operator  ' **’ 

Check  that  a  candidate  name  is  a  recognized 
function  name 

Initializes  number  components  to  qualified 


INITSFL  Initializes  source  file  list 

INITTFL  Initializes  target  file  list 

INTOASS  Returns  1  if  the  current  scanned  statement 

is  an  assertion  and  not  a  data  description 
statement 

INTOODL  Records  that  the  statement  scanned  is  a  data 

description  statement 
INTREC  Recognizes  integer 

MEMINIT  Initializes  number  of  members  of  record  or 

group 

Table  3.4  Semantics  Checking  Routines 
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gjWMrtlgf  Chyyhjnq  Rout  Inf 
NAME  WHAT  XT  DOES 


MKQNM 

MOPREC 

NAMEREC 

OPREC 

OR_REC 

RECNUM 

RELREC 


Concatenates  qualified  nane  component 8 
Recognizes  a  multiplication  operation,  i.e. 
•*'  or  •/' 

Name  recognizer;  checks  not  keywords 
Recognizer  for  the  operators 
Recognizes  the  alternation  operation  ' | 1 
Recognizes  and  scans  a  number 
Recognizes  any  of  the  relational 


SETBIT 

STARREC 

SVASSR 

SVENDC 


Used  to  set  and  reset  a  bit  that  indicate 

whether  the  statemnt  is  an  assertion  or  a 

data  description  statement 

Recognizes  a  for  indefinite  repetition 

Saves  the  actual  assertion  itself  during 

the  scanning  of  a  statemnt 

Recognizes  a  • ;  •  as  an  end  of  statemnt 

character 


Table  3.4  Semantics  Checking  Routines 


3.2.3  ERROR  MESSAGE  STACKING  ROUTINE 

there  is  a  subroutine  Which  stacks  error  diagnostics  to  print  out 
upon  recognition  of  a  syntactically-incorrect  user  statemnt.  Upon 
reaching  incorrect  syntactic  units,  the  automatically  generated  SAP  does 
not  print  its  own  messages,  but  expects  the  corresponding  diagnostics  to 
be  on  an  "error  stack",  specifically,  an  error  code  has  to  be  stacked 
for  each  expected  terminal  symbol  in  the  MODEL  language  in  case  the 
token  is  missing  or  incorrect.  Xf  the  expected  token  is  found,  the  SAP 
simply  pops  the  corresponding  error  code  and  continues;  if  the  expected 
token  is  missing  or  incorrect,  the  SAP  pops  the  corresponding  error 
code,  prints  the  statemnt  number,  the  unexpected  token,  and  the 
corresponding  error  ms  sage,  scans  for  the  end  of  the  statemnt 
delimiter  (;),  and  continues.  The  routine  that  stacks  such  error  codes 
is  called  "E".  Each  syntax  error  mssage  pinpoints  the  token  that  is 
incorrect,  missing,  unexpected,  or  misspelled. 

One  product  of  the  syntax  analysis  phase  is  the  Error  Diagnostics 
Report  containing  the  error  mssages.  Each  msage  gives  the  diagnostics 
corresponding  to  the  error  code  and  provides  the  exact  location  of  the 
error  so  that  it  can  be  corrected  and  resubmitted  by  the  user  easily. 
Xf  no  syntax  errors  aura  found  during  the  syntax  analysis  phase,  a 
mssage  is  sent  that  "NO  ERROR  OR  WARNINGS  DETECTED",  amd  the  Processor 
proceeds  to  the  next  phase .  But  if  error  diagnostics  were  produced,  a 
flag  is  set  to  disable  continuation  of  analysis  and  design  beyond  the 
syntax  Checking  phase . 

the  error  mssages  are  listed  in  Table  3.5. 
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ERROR  MESSAGES 
CODE  ERRORS 

1  A  bit  string  contains  character  other  than  0  or  l 

2  Missing  ’  < '  after  the  word  BLOCK 

3  Badly  formed  boolean  expresion  after  IP  in  statement 

4  Missing  or  invalid  numeric  constant  in  iterative 
count  spec 

5  Missing  or  invalid  numeric  constant  in  relative 
error  spec 

7  Organization  type  missing  or  illegal  in  DISK 
statement 

9  Type  disk  missing  or  illegal  in  DISK  statement 

12  Missing  ELSE  in  conditional  expression 
14  Assertion  missing  after  the  keyword  THEN 
18  No  boolean  expression  after  the  keyword  IP 

22  no  expression  after  the  keyword  * ( ' 

23  Keyword  is  missing 

24  Missing  right  parenthsis 

26  Missing  string  after  quote 

33  Error  in  recognition  of  a  right  hand  side  of  an 

assertion 

38  Keyword  THEN  is  missing 

39  Record  or  group  keyword  expected 

42  Record  name  missing  or  illegal  in  PILE  or  REPORT 
statement 

Table  3.5  ERROR  MESSAGES 


CODE 


g  IggSAgEg 
ERRORS 

44  Madi.ua  nan*  miaaing  or  illegal  in  FILE  or  REPORT 

45  Keyname  miaaing  in  FILE  or  REPORT  atatement 

46  Maximua  length  miaaing  or  illegal  in  variable  length 
in  FIELD  statement 

47  Invalid  or  miaaing  £ield  type  in  field/interim 
atatement 

48  Miaaing  or  invalid  length  in  field/ interia  statement 

49  Miaaing  right  parentheaia  after  field-type  in 
field/ interia 

50  sign  is  not  succeded  by  an  integer 

51  Miaaing/ invalid  max  no.  of  occurrences  of  items. 

52  Name  missing  or  illegal  in  item  list 

53  Miaaing  left  parenthesis  in  line  spec 

54  Miaaing  integer  in  line  spec 

55  Missing  right  parenthesis  in  line  spec 

56  Miaaing/invalid  file  name  after  keyword  FILE 

57  FORMAT  miss ing/mis ape 1 led  after  RECORD  in  storage 
statement 

58  Missing/invalid  tape  label 

59  Keyword  RECORDSIZE  miasing/misspelled  after  MAX 

60  Missing/invalid  volume  name  (external  or  internal) 

61  Missing/ invalid  device  type 

62  Missing/invalid  iterative  solution  method 

Table  3.5  ERROR  MESSAGES 


I 


ESEQB  MBSSftaa 

CODE  ERRORS 

63  Colon  missing  after  keyword  MODULE 

64  Name  missing  or  illegal  in  MODUUE  statement 

65  Error  in  assembly  of  a  number  constant 

66  Tape  spec,  parameter  miseing  or  illegal 

67  Error  in  a  picture  epee 

66  Qualified  name  illegal 

69  Record  format  missing  or  illegal 

70  Keyword  BLOCKSXZE  missing  in  record  format  spec 

71  Blocksize  value  missing/illegal  in  record  format 
spec 

72  Record  size  value  missing/  illegal  in  record  format 

spec 

74  Missing  • > •  at  end  of  statement 

75  Miesing  ' t '  after  keyword  SOURCE  PILES 

76  Name  missing/illegal  in  source  file  list 

77  ' t '  missing  after  keyword  TARGET 

78  Name  missing/illegal  in  TARGET  file  list 

79  Missing  THEN  in  conditional  expression 

80  unrecognizable  statement 

81  Badly  formed  arithmetic  expression 

82  Badly  formed  boolean  expression 

83  Badl  formed  boolean  term 

84  Badly  formed  concatenation  of  expressions 

Table  3.5  ERROR  MESSAGES 
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ERROR  MESSAGES 
CODE  ERRORS 

85  Badly  formed  factor 

86  Badly  formed  primary 

87  Badly  formed  term 

90  Missing  left  parenthesis  in  column  spec 

91  Missing  integer  in  column  spec 

92  Missing  right  parenthesis  in  column  spec 

101  Length  of  picture  spc.  is  too  small  or  too  big 

102  Specified  length  is  inappropriate  for  specified  type 
of  data 

104  Specified  maximum  length  is  inappropriate  or  too 
small 

105  The  fraction  point  offset  is  outside  of  bounds 
-128<p<127 

106  Bad  repetition  specification 

107  Illegal  character  in  picture  specification 

108  Expecting  a  level  number  in  a  structured  data 

description  statement 

Table  3.5  ERROR  MESSAGES 


3.2.4  ENCODING  USER  STATEMENTS 

These  supporting  routines  encode  some  of  the  MODEL  specification 
into  an  internal  representation.  Although  all  of  the  names  provided  by 
the  user  specification  are  kept  intact  in  internal  form  for  use  by  the 
object  program,  many  of  the  descriptions  and  attributes  are  encoded  for 
more  compact  and  efficient  processing  later.  For  example,  the 
description  in  a  FIELD  statement  enters  an  internal  table  Where  the  type 
of  field  is  encoded  (0  for  character,  1  for  binary,  2  for  numeric, 
etc.),  and  the  field  length  type  is  encoded  (0  for  fixed  length,  1  for 
variable  length).  One  encoding  routine  is  written  for  each  statement 
type.  Each  routine  is  invoked  automatically  after  recognition  of  the 
syntactic  unit  by  the  SAP.  The  invocation  is  automatically  generated  as 
part  of  the  SAP  by  the  SAPG  by  virtue  of  its  specification  in  the 
EBNF/WSC.  The  internal  format  of  the  tables  is  given  in  the  next 
section  in  conjunction  with  the  discussion  of  the  internal,  associative 
storage  of  the  MODEL  statements. 

The  encoding  and  saving  routines  are  listed  in  Table  3.6. 
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BgaeiBSZS&EBjg 

NAME 

ZNZTNUM 

SETDEVB 

SETDEVC 

SETDEVP 

SETDEVT 

SETDEVU 

SETFTJNC 

SETNUM 

SETSTRN 

SETSUBV 

SETVAR 


Tabl*  3.6  ENCODING/ SAVING  ROUTINES 
ROWINES 
WHAT  IT  DOES 

Initialise  scanning  a  numeric  constant 
Set  device  flag  in  media  description  to 
imply  disk  storage 

Set  device  flag  in  media  description  to 
imply  that  input  is  from  cards 
Set  device  flag  in  media  description  to 
imply  PRINTER 

Set  device  flag  in  media  description  to 
imply  a  terminal 

Set  device  flag  in  media  description  to 
imply  a  card  punch 

Initiate  a  node  in  the  syntax  tree  to  store 
a  function  reference 
Set  for  assembling  a  constant  number 
Initiate  a  node  in  the  syntax  tree  to  store 
a  string  constant 

Initiate  a  node  in  the  syntax  tree  to  store 
a  subscripted  variable 

Initiate  a  node  in  the  syntax  tree  to  store 
a  variable  name 

Stores  a  node  in  the  syntax  tree  after  all 
its  components  have  been  defined 


STALL 


NAME 
STB  XT 


Table  3.6  ENCODING/SAVING  ROUTINES 
INC/ SAVING  ROUTINES 

WHAT  IT  DOES 

Seta  the  current  string  contained  in  the 
,  temporary  node  to  be  a  bit  string 
STDEV  Store  device;  Tape  or  Disk 

STPUN  Stores  a  node  in  the  syntax  tree  which 

contains  a  function  nans 

STNUM  Concludes  the  assembly  of  a  constant  number 

STOIC  Concludes  the  storing  of  a  picture  type 

specification 

STH_CON  Stores  a  node  in  the  syntax  tree  Which 

contains  a  general  constant 

STRHS  Stores  am  assertion  in  the  associative 

memory  (am  entry  point  in  ASS I NIT) 

SVAASl  Sets  a  node  to  contain  a  conditional 

assertion 


SVASAE1 

SVBEXP 

SVBF1 

SVBUC 

SVBT1 

svee 


Sets  to  define  a  node  containing  a  simple 
assertion 

Sets  a  node  for  storing  a  boolean  expression 
Sets  a  node  for  storing  a  boolean  factor 
Saves  block  size  in  disk/tape  storage  entry 
Sets  a  node  for  storing  a  boolean  term 
Encodes  character  code 


NAME 

SVCMP1 


SVCOL 

SVCON 

SVCOND 

SVDEN2 

SVDEV 

SVFAC 

SVFDTP2 

SVPLD 

SVFUM 


SVKEY 

SVLAB 

SVLAB2 

SVLINE 

SVMEN 


Table  3.6  ENCODING/SAVING  ROUTINES 
ROUTINES 
WHAT  IT  DOES 

Save  in  a  node  the  recently  scanned 
syntactical  unit  aus  the  first  descendant 
Saves  column  number  in  field  storage  entry 
Sets  a  node  for  storing  a  concatenation  of 
expressions 

Sets  a  node  for  storing  a  conditional  exp. 
Saves  density  for  tape 

Set  device  name  to  storage  name,  and  save 
devicet  Tape  or  Disk 
Sets  a  node  for  storing  a  factor 
Encodes  field  type,  including  NUM  and  DEC 
Encodes  field  statement  type  as  FLD 
Save  file  name.  Call  SVFILE,  set  default 
naans  for  record  storage,  and  reset  device 
bit  (DEVBIT) 

Saves  key  field  in  file  storage  entry 
Encodes  label  type  in  tape  statement 
D-none,  1-IBH_STD,  2-ANSI_STD,  3-BYPASS 
Save  label  for  tape 

Saves  line  number  in  field  storage  entry 
Saves  member  name  in  record/group  storage 
entry 


Table  3.6  ENCODING/SAVING  ROUTINES 
ENCODING/ SAVING  ROUTINES 
NAME  WHAT  IT  DOES 

SVMiFUl  Sava s  minimum  field  length  in  FLD  statement 

SVMNOC  Saves  minimum  number  of  occurrences  in 


SVMOD 

SVMXFLN 

SVMXOC 

SVNUMTP 

SVNXCMP 

SVNXDP 


SVOP1 

SVORG3 

SVPAR2 

SVPIC 

SVPRIM 

SVPICST 

SVQTY2 


record  or  group  storage  entry 

Marks  the  mode  as  FIXED  or  FLOATING 

Saves  maximum  field  length  in  FID  statement 

Saves  maximum  number  of  occurrences  in 

record  or  group  storage  entry 

Marks  the  data  type  as  a  numeric  data  type 

(BINARY  or  DECIMAL) 

Saves  the  next  assembled  syntactical  unit 
in  a  syntax  node  which  is  its  ancestor 
Saves  the  next  delimiter  associated  with 
the  assembled  syntactical  unit  or 
separating  it  from  its  successor 
Saves  an  initial  delimiter  associated  with 
phrase  such  as  unary  '-'  or  'IF' 

Saves  organization  for  disk 
Saves  parity  for  tape 
Denote  the  data  as  'PICTURE* 

Sets  for  assembling  a  phrase  for  a  PRIMARY 
Saves  the  picture  specification  string 
Saves  quantity  for  disk 


Table  3.6  ENCOD ING/SAVTNG  ROUTINES 
ROUTINES 
WHAT  IT  DOES 

Saves  record  size  in  tape/disk  storage  enrty 
Encodes  record  format  on  tape/disk  storage; 
0— FIXED,  1-FIXED  BLOCK,  2-VARIABLE 
Saves  the  scale  factor  specified  in  the 
precision  specification  of  the  data  type 
Saves  source  file  name  in  source  storage 
entry 

Records  and  saves  the  repetition  spec.  '(*)* 

in  a  file  statement 

Save  start  file#  for  tape 

Saves  storage  name  in  FILE  storage  entry 

Transfer  an  assembled  string  constant  from 

the  general  buffer  into  a  special  temporary 

storage.  The  final  storage  of  the  node  will 

be  done  by  STR_CON . 

saves  target  file  name  in  target  storage 
entry 

Initializes  a  node  to  store  a  phrase  for  a 
TERM 

Saves  number  of  Tracks  for  tape 
Save  units  as  CYL  for  disk 

Saves  volume  name  in  disk/tape  storage  entry 


3.2.5  STATEMENT  STORAGE  ROUTINES 

These  routines  collect  the  strings  of  names  and  other  vital 
information  in  the  MODEL  statements,  and  pass  them  to  the  STORE  system. 
Which  is  a  subsystem  in  itself  to  store  the  statements  for  later 
processing.  Such  storage-invoking  routines  are  called  at  the  end  of 
scanning  each  MODEL  statement,  and  are  the  ones  that  begin  with  the 
letters  "ST"  (e.g,  STFLD,  STREC,  etc).  The  storage  subsystem  described 
below  (STORE),  which  is  called  by  these  routines,  stores  the  MODEL 
statements  in  a  simulated  associative  memory  that  facilitates  later 
retrieval. 

On  analyzing  the  assertions  (computational  statements)  a  syntax  or 
derivation  tree  which  represents  the  assertion  is  generated  and  stored. 
This  representation  facilitates  later  analysis  and  scanning  of  the 
assertion,  as  well  as  systematic  transformation.  The  tree 
representation  is  reconverted  into  text  form  in  the  code  generation 
phase. 

At  the  end  of  the  syntax  phase,  we  have  the  entire  set  of  MODEL 
statements  stored  in  a  convenient  storage  system  for  further  analysis. 
The  storing  subroutines  which  invoke  the  use  of  the  STORE  system  act  as 
an  interface  between  the  automatically  generated  SAP  and  the  storage 
system  presented  below.  The  storage  system  is  an  extension  to  the 
capabilities  of  the  SAPG  since  it  is  general  purpose  in  nature  and  is 
independent  of  the  nature  of  the  language  specified,  and  could  be  used 
for  processing  other  languages. 
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The  storing  routines  sun  listed  in  Table  3.7. 
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Table  3.7  STORING  ROUTINES 


STORING  ROUTINES 

NAME  WHAT  IT  DOES 


STPILE 

Stores 

STFLD 

Stores 

STGRP 

Stores 

STMOD 

Stores 

STPNCH 

stores 

STRBC 

stores 

STS RC 

Stores 

STTAR 

Stores 

PILE  statement 
FIELD  statement 
GROUP  statement 
MODULE  statement 
PUNCH  statement 
RECORD  statement 
SOURCE  PILES  statement 
TARGET  piles  statement 


3.2.6  HOUSEKEEPING  ROUTINES 

Finally,  there  are  a  few  ’•housekeeping”  type  subroutines  Which  need 
not  be  written  by  the  language  de finer  because  they  are  provided  by  the 
SAPG,  but  which  need  to  be  included  in  the  EBNF/WSC. 

The  housekeeping  routines  are  listed  in  Table  3.8 


Table  3.8  HOUSEKEEPING  ROUTINES 


HOUSEKEEPING  BQMEIMES 


NAME 

ADLEX 

CLRERRF 

ENDINP 

FREETMP 

NEGATE 

STOT_FL 

STMTINC 


WHAT  IT  DOES 

Adda  a  subpart  of  a  floating  point  constant 
to  its  full  representation 
Clears  errors  flag  every  statement  to 
indicate  no  syntax  errors  yet  in  next 
statement 

Executed  upon  end-of-£ile  to  print  last  line 
and  wrap-up 

Frees  allocation  of  a  temporary  data 
structure  which  was  needlessly  allocated 
Negates  the  value  of  a  negative  integer 
constant  to  derive  its  real  representation 
Scams  for  end  of  statement  delimiters  when 
unrecognizable  statement  encountered 
Increments  the  statement  number;  called  at 
end  of  eaush  statement 


3.2.7  AN  INDEX  TO  SAP  ROUTINES 

The  subroutine  names  used  in  the  specification  of  MODEL  can  be 
classified  into  one  of  the  following  four  types  of  subroutines; 
encoding/saving  routines,  storing  routines,  semantics  checking  routines, 
amd  housekeeping  routines.  Table  3.6,  3.7,  and  3.8  provide  an 
alphabetical  listing  of  the  routines  within  each  category.  As  for  error 
messages,  the  error  code  and  their  meanings  are  shown  in  Table  3.5. 


3.3  THE  STRING  STORAGE  AND  RETRIEVAL  SUBSYSTEM 
3.3.1  INTRODUCTION 

The  store  routines  that  aure  referred  to  in  the  EBNP  description  of 
MODEL,  utilize  a  general-purpose  mechanism  for  storing  source  language 
strings.  A  similar  mechanism  is  used  later  for  retrieving  these  source 
language  strings.  The  following  system,  basically,  consists  of  a 
directory  structure,  described  in  section  3.3.2  and  the  format  of 
storage  entries  described  in  Section  3.3.3.  There  acre  also  two  main 
procedures i 

(1)  STORE  for  storing  source  language  string  collected  during  syntax 
analysis .  STORE  is  described  in  Section  3.3.4. 

(2)  RETRIEVE  for  accessing  previously  stored  source  language  strings, 
based  on  a  variety  of  "keys".  RETRIEVE  is  described  in  Section 
3.3.5. 

Additionally  a  set  of  routines  specified  in  EBNP  parses  and  stores 
the  assertions.  Section  3.3.6  describes  the  format  of  stored 
assertions,  section  3.3.7  describes  the  routines  that  store  the  parsed 
assertions.  These  routines  have  also  been  referred  to  in  the 
description  of  saving  and  encoding  routines  in  Section  3.2.4. 
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The  STORE  procedure  accepts  strings  Which  are  formed  by  the 
subroutines  called  during  syntas  analysis.  It  stores  the  strings  in 
memory  which  we  call  "storage  entries"  while  building  "directory 
entries"  in  a  directory  of  certain  names  designated  as  keys.  By 
building  a  directory,  the  strings  are  stored  "associatively"  in  the 
sense  that  statements  can  later  be  retrieved  based  on  their  content. 
This  capability  is  crucial  to  "non-procedural"  language  processor  since 
the  statements  can  be  input  in  any  order. 


3.3.2  THE  DIRECTORY  AND  STORAGE  STRUCTURE 

The  storage  entries  (the  strings  to  be  stored)  consist  of  two 
parts: 

(1)  the  key  names  to  be  entered  in  the  directory  which  include  the  names 
the  user  provided  in  the  MODEL  statements  for  naming  data,  assertions, 
etc.  these  are  the  names  by  which  we  may  want  to  retrieve  information 
later. 

(2)  auxiliary  data  from  the  source  language  strings  including  the 
encoded  information  in  table  form.  This  information  is  not  used  as  the 
basis  of  retrievals. 

Each  storage  entry  will  contain  information  from  a  given  MODEL 
statement.  They  will  appear  in  memory  in  the  order  in  which  they  are 
processed. 

The  directory  consists  of  an  entry  for  each  key  name.  Each 
directory  entry  points  to  the  first  storage  entry  containing  that  key 
name.  A  linked- list  is  then  maintained  from  the  first  storage  entry 
with  that  key  name  to  other  storage  entries  containing  the  same  key 
name.  A  binary  tree  structure  was  chosen  for  the  directory  to  make  tree 
modifications  and  key  names  searches  efficient.  It  is  the  first  key 
name  entered  in  the  directory  which  becomes  the  root  of  the  directory 
tree;  the  next  key  is  entered  "above"  or  "below"  it  in  the  tree  by 
lexicographic  order;  etc. 

Each  directory  entry  has  the  following  form: 


I  Key  name  |  Ptr-to-first  |  Up-pointer  |  Down-pointer  | 


where  "Keyname"  is  a  string  of  (up  to)  10  characters  (padded  with  blanks 
to  its  right  side) 

"Ptr-to-first "  is  a  pointer  to  the  first  storage  entry  containing  the 
"key  name". 

"up-pointer"  and  "Down-pointer"  are  pointers  to  other  directory  entries, 
whose  key  names  are  up  or  down,  respectively,  in  the  lexicographic 

sense. 


Each  storage  entry  has  the  following  form: 
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3.3.3  STORAGE  ENTRIES  FORMAT  FOR  MODEL  STATEMENTS 

Dm  STORE  mechanism,  described  In  the  next  section,  is  called  by 
SAP's  storing  subroutines  to  store  the  model  statements  for  retrieval 
(by  RETRIEVE)  in  the  later  phases.  For  each  type  of  MODEL  statement, 
the  key  names  in  it  are  stored  in  its  storage  entry.  The  non-key 
information  in  the  MODEL  statement  (information  which  is  not  used  to 
specify  retrievals)  is  kept  in  description  tables,  which  are  connected 
(by  STORE)  to  the  corresponding  storage  entries  as  was  shown  above. 
Table  3.9  sunmarises  the  internal  format  of  the  storage  entries  and  the 
corresponding  description  tables  for  each  type  of  MODEL  statement.  The 
leftmost  nams  in  each  entry  is  the  name  of  the  statement  being  stored. 
The  middle  column  shows  the  information  appearing  in  the  corresponding 
storage  entry  (with  the  pointers  emitted  due  to  lack  of  space).  The 
right  column  shows  the  additional  encoded  information,  if  any,  from  the 
statement .  The  key  names  beginning  with  a  dollar  sign  (3)  in  the 
storage  entries  are  not  user-proveded,  but  are  inserted  by  the  system 
for  its  own  information.  The  last  name  in  each  storage  entry,  for 
example,  identifies  the  type  of  statement. 
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Table  3.9  Storage  entries  Format  for  MODEL 


MftfieL  Statement  Snhank 

Stornna  Entry  Kay  Raaoa 

AuxlliUrv  DtacriPtlont 

aodula-naaa  ♦MODULE 

cm  isnsJL. 

MODULE*  aodula-naaa 

MODI* 

n 

SOURCE  PILES*  S| ■  *2, 

♦SOURCE  at  a2  ...  an 

SRCP 

n 

TARGET  FILES*  *t,  t2.  ...  t„ 

♦TARGET  ti  tt  ...  «a 

TARP 

n 

filename  IS  PItE{  ).  , 

STORAGE  IS  a.  RECORD  r 

FEY  IS  k,  ORC  IS  0) 

j^fUcnaaa  r  •  k  ♦PILE'. 

FILE 

ft 

ORG  -Cod*  Kay-flat 

0-  sam  0  no  sort 

1-  ISAM  k«» 

1-aort  key 

rteord~na««  IS  RECORD 

(*J  •  # •  a  «  »*n ) 

raeord-naaa  Bj  Bj  ...  a„ 
♦Pfila  JRECD 

OECD 

n 

#ae abara  aaabara 

/subscripts 
first  eub. 
second  sub. 

troup-naaa  IS  CROUP 

(a., a, . *.,)  troup-naaa  JI  »2  •••  *n  ORP  n  («u*  ••  racord) 

2  *Pfil«  $C«P 


li'iur 
0-no  repat. 
for  r 

1 -r  repaati 


-  54  - 


3.3.4  ’K®  STORE  PROCEDURE 

The  STORE( S,D)  Procedure  hu  two  parameters,  S  and  D.  S  is  the 
string  containing  the  key  names  which  are  to  be  stored  and  to  be  entered 
in  the  directory.  D  is  a  pointer  to  previously  built  auxiliary  data 
from  the  source  string.  The  latter  usually  is  an  encoded  form  of 
non-key  source  language  information. 

Algorithm  STORE  shows  the  storing  procedure.  STORE  receives  the 
key  names  from  S  and  creates  a  storage  entry  for  it  (Steps  1-3).  It 
Checks  if  they  are  in  the  directory  (Steps  4—5 ,  subroutine  SEARCH  DIR). 
If  the  key  is  in  the  directory,  then  it  follows  the  "pointer- to-  first" 
*  points  to  the  first  storage  entry  with  that  name  (Steps  7-8).  The 
array  of  strings  in  that  storage  entry  is  scanned  until  the  key  name  is 
found.  If  its  "nest”  pointer  is  null  (end-of-list ),  then  it  is  set  to 
point  to  the  newly  created  storage  entry  (Steps  8-11).  If  it  is  not, 
the  process  is  repeated  until  a  null  (end-of-list)  pointer  is  found 
(Steps  9-10).  If  the  current  key  name  is  not  found  in  the  directory,  it 
is  entered  in  the  appropriate  spot  in  the  lexicographical  position  in 
the  directory  (Step  6,  subroutine  CREATE  DIR)  and  the  pointer  in  the 
directory  is  set  to  point  to  the  newly  created  first  storage  entry 
(Steps  7-8). 
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Algorithm  STOP.  I*.  :  The  Score  Procedure 

Parameters:  S-string  of  keys  to  be  stored; 

P-pointtr  to  ocher  data 

(sea  Section  2.3.2  for  diagrams  of  Data  Strict  ires) 

[Subroutines  called:  CHECKJ>I*,  CEKEPATEJiKTRY] 

Step  1.  Count  ffKEYS. 

Step  2.  Allocate  the  storage  entry  for  S  (call  it  SE,  according  to  the 
format  shown). 

Step  3.  Connect  PTP_TO_PATA  in  SE  to  D. 

Step  4.  For  each  key  name,  perform  steps  5  through  IT. 

Step  5.  If  key  exists  in  the  directory  (Algorithm  CHECK-DIP  ),  then  go 
to  step  7;  else  go  to  step  6. 

Step  6.  Create  a  directory  entry  for  this  key.  (Algorithm  CECEP.ATE- 
EKTF.Y  ) 

Step  7.  Let  DE-this  directory  entry. 

Step  8.  If  rT?._TO__FIFS?  in  DC  already  points  to  a  first  storage  entry 
with  this  key  name,  then  g°  Co  step  9;  else  go  to  step  11. 

Step  9.  Get  the  next  storage  entry  in  the  list. 

Step  10.  If  it  is  the  last  in  list,  then  go  to  step  11;  else  go  to 
step  ?. 

Step  11.  Add  the  new  SE  to  the  list. 

Step  12.  Return. 
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3.3.5  THE  RETRIEVE  PROCEDURE 

RETRXEVE( E,D,S,N,P )  is  the  procedure  for  retrieving  desired  storage 
entries,  by  searching  through  the  data  structures  depicted  in  Figure  3.4 
and  Table  3.4.  It  is  invoked  by  many  routines  described  in  subsequent 
phases  of  the  Processor.  It  has  five  input  parameters  as  indicated. 
RETRIEVE  finds  all  the  storage  entries  in  Which  the  given  key  name  or 
expression  of  key  names,  E,  appears  and  furthermore  checks  Whether  the 
first  characters  of  data  associated  with  the  storage  entries  match  the 
string  D.  That  is,  RETRIEVE  finds  all  the  storage  entries  with  keys 
satisfying  the  logical  expression  E  and  other  data  D.  RETRIEVE  starts 
its  search  at  directory  entry  S,  normally  the  root  node  of  the 
directory,  and  it  returns  a  list  of  pointers  P,  to  those  storage  entries 
Which  satisfy  the  request  of  the  calling  program.  The  number  of  storage 
entries  satisfying  the  request  is  returned  in  N. 

The  logical  expression  used  to  retrieve  strings  can  be  any  boolean 
expression  involving  Mkeyn  names  or  names  in  the  MODEL  statements  in 
disjunctive  normal  form,  where  the  first  key  in  each  term  is 
non-negated.  For  example,  consider  the  following  statement  by  a  calling 
program: 

CALL  RETRIEVE( KEYS ,  ",  START,  N,  P); 

KEYS  might  contain  the  string  value  'PRICE  £ 
"QUANTITY  | EXTENT  ' .  This  makes  RETRIEVE  find  all  storage  entries 
(Which  correspond  to  all  statements  in  the  MODEL  specification)  in  Which 
PRICE  appears  and  QUANTITY  does  not  appear,  or  statements  in  Which 
EXTENT  appears.  The  null  second  parameter  means  that  the  auxiliary  data 
portion  of  each  statement  is  ismaterial.  RETRIEVE  would  then  start  its 
search  and  return  a  list  of  pointers  in  P  to  those  storage  entries  Which 
satisfy  the  condition,  and  N  would  be  set  to  the  number  of  statements 
that  satisfy  the  condition. 

Algorithm  RETRIEVE  is  shown  in  the  following  page. 
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Algorithm  r.nsir.vc  !  The  Retrieve  Procedure 

Parameters:  E-logical  expression  string;  S»poincer 

to  beginning  of  directory  (input); 

P»list  of  pointers  satisfying.  F.;  N»nunber  of 
satisfying  entries 

(see  Figure  7a  for  diagrams  of  data 
structures) 


Step  1.  Cot  leading  key  name  K  of  next  conjunct  fron  E.  If 
no  more,  go  to  Step  22. 

Step  2.  Check  directory  for  K  (standard  binary  tree  search 
in  subroutine  SEARCH -01?  given  earlier). 

Step  3*  If  found,  then  go  to  step  4;  else  go  to  step  1. 

Step  4.  Set  PSE*PTP_TO_FIRST  (pointer  to  first  storage  entry 
with  K)  “ 

Stop  5.  Add  rsr.  co  tf  list  (temporary  list  of  pointers) 

Step  6.  If  K  in  PSF  storage  entry  points  to  another  storage 
entry  with  K,  then  go  to  step  7;  else  go  to  step  8. 

Step  7.  Set  PSE  co  next  storage  entry  in  the  list,  go  to 
Step  5. 

Step  8.  If  end  of  E,  Chen  go  Co  step  20;  else  go  to  3tep  9. 
Step  9.  Cet  next  symbol  In  F. 

Step  10.  If  symbol* 'S'  then  go  to  step  14;  else  go  to  step 

U. 

Step  11.  If  symbol™' |'  then  go  to  step  12;  else  error 
return. 

Step  12.  Add  list  of  pointers  in  U  to  list  of  pointers  in  P 
without  duplication. 

Step  13.  Co  to  step  1. 

Seen  14.  Cct  next  symbol. 

Step  15.  If  symbol-'''  then  go  to  step  15;  else  go  to  step 
18. 

Step  16.  (Case  of  conjoining  negated  term)  eliminate 
pointers  in  V  to  storage  entries  which  also  contain  next  key 
name  in  F. 

Step  17.  Co  to  step  8. 

Step  18.  (Case  of  conjoining  non-negated  term)  eliminate 
pointers  in  W  to  storage  entries  which  do  not  contain  next 
key  name  in  F. 

Step  19.  Co  to  step  8. 

Step  20.  Add  list  of  pointers  In  W  to  list  of  pointers  In  P. 
Step  21.  Set  ^"number  of  pointers  in  P  list. 

Step  22.  return. 


An  example  showing  the  retrieval  mechanism  to  retrieve  all  storage 
entries  with  key  names  "B"  and  "C"  is  given  in  Figure  3.5. 


sun;  ?acs?.^  _ 

e*u  xraur-'!  f  "I 

t<!  talU  f*j  ; 

w-woritL-.c  ° -  _  >  i0  ^  t 

-  fisss?  r^ri  gs*s  It 

Ail  finjW* 
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Fig.  3.5  Example  of  Retrieval  Mechanism 
The  diagram  shows  in  parentheses  the  steps  that  correspond  in  the 
algorithm.  RETRIEVE  starts  by  getting  the  leading  key  name  of  the  first 
conjunct  (Step  1)  and  searches  the  directory  for  it  (Step  2).  If  found, 
it  puts  the  list  of  pointers  to  all  storage  entries  with  that  name  in  a 
temporary  list  (Step  3-7).  If  there  are  other  names  in  the  conjunct 
(Steps  10,  14),  then  RETRIEVE  eliminates  from  the  temporary  list  those 

pointers  Whose  storage  entries  do  not  have  the  other  names  in  the 
conjunct  (Steps  14-16).  If  there  are  more  conjuncts  in  the  expression, 
then  the  process  is  repeated  and  additional  pointers  are  added  to  the 


list  (Steps  12-13).  When  the  end  of  the  expression  is  reached,  the  list 
of  pointers  to  the  satisfying  storage  entries  and  the  number  of  pointers 
sure  returned  (Steps  20-22). 


3.3.6  STORAGE  STRUCTURES  FOR  ASSERTION  STATEMENTS 

Analysis  of  an  assertion  statement  causes  two  storage  entries  to  be 
made  for  the  satatement.  (See  also  Table  3.9).  The  first  entry  has  the 
type  ASTX  and  contains  in  its  main  part  just  the  assertion  label  (system 
generated)  and  a  keyword  3 ASSERT.  Its  auxiliary  data  contains  a  pointer 
to  the  syntax  tree  which  represents  in  a  parsed  form  the  body  of  the 
assertion.  The  second  entry  has  the  type  ASTG  and  contains  a  list  of 
all  the  names  which  axe  sources  and  targets  to  the  assertion.  Sources 
are  all  the  names  which  appear  on  the  right  hand  side  of  each  equal 
sign,  (including  subscript  expressions)  and  within  boolean  condition 
expressions.  Targets  are  the  names  whose  values  are  defined  by  the 
assertion. 


3. 3. 6.1  THE  SYNTAX  TREE  FOR  AN  ASSERTION 

The  syntax  tree  of  an  assertion  is  constructed  out  of  mutually 
linked  nodes.  There  are  nodes  of  two  types:  non-terminal,  nodes  which 
have  descendants  and  terminal  nodes  which  have  no  descendants  and 
represent  atomic  syntactical  units  such  as  identifiers,  numeric  and 
string  constants.  Each  node  corresponds  to  a  phrase  in  the  parsed 
assertion,  and  if  it  is  non-terminal  the  list  of  its  descendants 
represents  the  further  breakup  of  this  phrase. 


3. 3. 6. 2  THE  STRUCTURE  OF  NON-TERMINAL  NODES 

The  structure  of  non-terminal  nodes  is  as  follows: 


|  |  n-  |  |  Pointer)  |  |  Pointer | 

|  TYPE |  Number  |  Delimit |  to  Sonl)  ...  |  Delimit |  to  Son  | 
|  |  of  Sons)  *1  |  *1  |  |  #n  |  #n  | 


where  "TYPE"  is  an  integer  code  identifying  the  syntactical  type  of  the 
phrase  according  to  the  following  legend: 

o  -  Conditional  Assertion.  Example:  if  A-B  THEN  C«D 

1  -  Simple  Assertion.  Example:  A-B 

2  -  Conditional.  Expression. 

Example:  IF  A  >  B  THEN  C  ELSE  0 

5  -  Boolean  Expressions.  Example:  (A-B)  |  (C-D) 

6  -  Boolean  Term.  Example:  (A  >  5)  &  (C  «»  3) 

7  -  Boolean  Factor.  Example:  C  -  7 

8  -  Concatenation.  Example:  All  ||  'END' 

9  -  Arithmetical.  Expression.  Example:  ( A*B)+(C*D) 
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10  -  Term.  Examples  A»B 

11  -  Factor.  Examples  A**2 

12  -  Primary.  Examples  A,  B(I+1),  (A+B) 

13  -  Function.  Examples  SUM(A,I) 

14  -  Subscripted  Variable.  Example s  A(FOR_EACH.A) 

"Humber  of  Sons”  is  the  number  of  components  or  subphrases  that  the 
indicated  phrase  is  broken  into.  Thus  if  the  phrase  is  "A+B"  it  is  of 
type  9  (Arithmetical  Expression)  and  it  is  parsed  further  into  the 
subphrases  "A"  and  "B".  The  '+•  delimiter  will  be  stored  as  delimiter 
number  2  in  the  current  node. 

The  delimiters  sure  encoded  as  integers  according  to  the  following 
legends 

1  -  '  '(Blank  -  Ho  delimiter) 

2  -  'IF*  (keyword) 

3  -  'THEN' 

4  -  'ELSE* 

5  - 

6  -  '  +  ' 

7  _ 

8  -  (Standing  for  multiplication) 

9  -  '/' 

10  -  '»*•  (Exponentiation) 

11  -  ' | '  (Alternation  -  Logical  'or' ) 

12  -  " 

13  -  '||'  ( Concatenation ) 

14  -  "  (Negation) 

15  -  '( ' 

16  -  '  )' 

17  - 

18  -  ’>' 

19  -  '  >-' 

20  -  '<• 

21  -  '<«' 

22  -  '«' 

23  -  •>' 

24  -  '«' 

"Delimiter  l,  i-l,  . .n"  are  the  delimiters  separating  the  subphrases. 
The  first  one  is  the  delimiter  prefixing  the  whole  phrase  such  as  the 
in  the  phrase  -A  or  the  '  '  in  the  phrase  '  ( A<B  &  B<C)'.  "Pointer 
to  Son  i,  i-l,..n"  are  pointers  to  other  nodes  which  represent  the 
subphrases  into  which  the  current  phrase  is  parsed. 


3. 3. 6. 3  THE  STRUCTURE  OF  TERMINAL  NODES 

Terminal  nodes  are  used  to  store  constants  such  as  variable  names, 
string  or  numeric  constants .  Their  structure  is  as  follows! 


|  type  |  str-length  |  value  | 


where  "type"  is  am  integer  code  identifying  the  type  of  the  constant 
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according  to  the  following  legend t 

20  -  character  string  constant.  Example?  'ABC' 

21  -  function  name.  Example i  SOM 

22  -  numeric  constant.  Example »  3.14 

23  -  variable  name.  Example i  PAY 

24  -  bit  string  constant.  Examples  '1001'B 

"Str- length"  is  the  length  of  the  Character  string  representing  the 
constant.  Zt  will  be  3  for  storing  the  variable  name  'PAY'.  "Value"  is 
the  actual  character  string  representing  the  constant. 

During  later  processing  (Module  ENEXDP),  all  the  terminal  nodes 
which  refer  to  non-constants  (types  21,23)  are  converted  to  a  different 
format;  referred  to  ess  variable— terminal— nodes  s 


t  Type  |  Mode#  | 


'Type'  as  before  is  an  integer  code  identifying  the  type  of  the  name 
according  to  the  following  legends 

25  -  Variable  type.  The  associated  name  is  a  variable  and  N0DE_  is 

the  dictionary  entry  number  of  this  variable. 

26  -  Subscript  type.  This  stores  the  name  of  a  subscript.  NODE# 

refers  to  a  dictionary  entry  number.  TSiis  dictionary  entry  can 
be  of  one  of  the  following  types s 

'GRP*,  ' RECD' ,  or  *FU>* ,  which  must  be  repeating.  If  this  entry 
name  is  X  then  the  name  of  the  subscript  is  por_EACH.X. 

' SSUB"  -  This  is  a  global  subscript  declared  by  the  user. 

'$'  -  This  is  a  free  subscript  added  by  the  system.  It  is  one 
of  the  subscripts  $l..to$9. 

27  -  Function  Name.  NODE#  is  an  index  in  a  list  of  functions 

recognized  by  the  system.  See  Table  3.10  for  the  list. 

An  overall  example  consider  the  syntax  tree  for  the  assertion? 

If  A-B  |  C<D  £  E<»F 

THEM  X(PORJEACH.X)  -  (Y+Z)*T|  |  '$'> 

ELSE  X( FORJEACH . X )  -  'O'; 

It  is  described  in  Fig.  3.6,  with  the  modification  that  delimiters  are 
represented  by  themselves  rather  then  in  their  encoded  form,  to  improve 
readability. 


3.3.7  TOE  SYNTAX  TREE  CONSTRUCTION  ROUTINES 


Several  routines  are  responsible  for  the  construction  of  the  syntax 
tree  of  an  assertion.  They  may  be  classified  and  described  as  follows » 
Setup  Routines t  On  entering  a  parse  for  a  phrase  of  a  certain  type  (by 
SAP)  an  appropriate  setup  routine  is  called.  This  routine  allocates  a 
temporary  node  area  ( temporary  since  we  do  not  know  yet  how  many 
subphrases  or  components  it  will  have),  assigns  a  type  number 
corresponding  to  the  type  of  the  phrase  and  resets  a  component  count  to 
0. 

there  is  a  setup  routine  corresponding  to  each  phrase's  type.  They  are 
for  the  non-terminal  types  (listed  in  increasing  type  code  order): 
SVAASO.  SVASSR  (SVASAE1),  SVBEXP,  SVBT1,  SAVF1, 

SVCON,  SVAE,  SVTERM,  SVFAC,  SVPRIM,  SETFUNC,  SETSUBV. 

For  the  terminal  types  (codes  >  19),  a  string  area  is  allocated  and 
a  type  variable  is  assigned,  too.  No  setup  routine  exists  for  bit 
string  since  the  distinction  between  it  and  a  character  string  can  be 
made  only  at  the  end  of  its  scanning. 

Save  Routines:  These  are  common  to  all  non-terminal,  phrases.  They 
alternately  store  delimiters  and  pointers  to  components,  increasing  the 
"number  of  sons"  counter  appropriately.  These  are  all  stored  in  the 
temporary  node  storage  area. 

SVOPl  -  stores  a  first  delimiter.  If  this  routine  is  not  called  the 
first  delimiter  is  always  set  to  1  (*  '  ' ) . 

SVCMPl  -  stores  a  pointer  to  the  first  component. 

SCNXDP  -  Stores  the  recently  scanned  delimiter  in  the  next  available 
delimiter  slot.  Then  increment  the  "number  of  sons"  counter. 
SVNXCMP  -  Stores  a  pointer  to  the  recently  assembled  subphrase  in  the 
next  available  component  slot. 

Storing  Routines:  These  finalize  the  node  structure,  after  scanning  of 
the  phrase  is  complete.  Since  size  of  strings  and  number  of  sons  are 
known  by  this  time,  a  permanent  node  space  is  allocated  and  the  contents 
of  the  temporary  storage  entry  transferred  there.  The  temporary  storage 
area  is  then  freed. 

STALL  -  This  is  the  storing  routine  for  all  the  non— terminal  nodes. 
It  first  checks  to  see  if  the  assembled  node  is  not  trivial.  It  will  be 
trivial  if  it  contains  only  one  component  and  the  first  delimiter  is 
blank.  In  this  case  no  permanent  storage  is  made  for  this  node.  This 
check  eliminates  redundant  nodes  in  the  syntax  tree.  If  the  node  is  not 
trivial,  a  permanent  allocation  is  made  for  it  and  the  proper  contents 
transferred  there. 

For  the  terminal  nodes  we  have  separate  storing  routines: 

STNUM  -  Stores  a  numeric  constant 
STFUN  -  Stores  a  function  name 

SVSTRNG  -  Transfers  a  string  constant  to  the  storage  area  before  calling 
on  STR  CON. 

STB IT  -  Stores  a  bit  string 

STR_C0N  -  A  coranon  routine  for  storing  all  constants.  It  allocates  a 
permanent  node  storage  and  transfers  type,  length  and  string 
into  it. 
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1 

KBS 

|  ADJR 

|  AMAX 

|  AMIN  | 

1 

ANY 

|  BIT 

|  CEIL 

|  CHAR  | 

1 

COPY 

i  DATE 

|  DECIMAL 

1  EXP  { 

1 

FALSE 

|  FIXED 

|  FLOAT 

|  FLOOR  | 

1 

HIGH 

j  INDEX 

|  LENGTH 

1  LOG  | 

1 

MAX 

|  MIN 

|  MOD 

i  PAGE  | 

1 

REPEAT 

)  ROUND 

j  RTRIM 

|  RUNSUM  | 

1 

SELECT 

|  SIGN 

|  SSN_FN 

|  STRING  | 

1 

SUBSTR 

1  SUM 

i  TIME 

|  TRANSLATE) 

1 

TRUE 

|  UNSFEC 

1  UPDATE 

|  VERIFY  | 

Table  3.10  The  functions  recognized  by  the  MODEL  processor. 
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CHAPTER  4 

PRECEDENCE  ANALYSIS 


4.1  INTRODUCTION 

A  MODEL  specification  consists  of  many  data  description  or 
assertion  statements.  In  principle,  the  data  description  statements 
specify  the  structure  of  data  entities  such  as  file,  group,  record,  and 
field.  The  assertions  specify  the  relationships  between  the  data 
entities.  The  data  entities  and  the  assertions  are  referred  to  here  as 
program  entities.  On  the  other  hand,  in  an  executable  program  there  are 
program  events  auch  as  I/O  activities,  computations,  or  getting  data 
ready.  The  events  in  a  program  generated  by  the  MODEL  system  correspond 
to  entities  in  the  specification.  For  example,  a  file  entity 
corresponds  to  an  event  of  opening  a  file  or  closing  a  file;  a  record 
entity  corresponds  to  reading  a  record  or  writing  a  record;  and  an 
assertion  entity  corresponds  to  computing  a  target  variable.  The 
sequence  of  the  program  events  is  not  given  by  the  user.  Instead,  it  is 
determined  by  the  MODEL  processor  under  the  constraints  of  precedence 
relationships  among  the  program  events.  In  this  chapter  we  discus*  the 
analysis  for  recognizing  the  precedence  relationships  between  program 
event 8  and  representing  them  in  a  directed  graph. 

Based  on  the  specification  we  can  find  the  unique  symbolic  names 
assigned  by  the  user  to  data  entities.  Additionally  the  MODEL  processor 
automatically  assigns  a  unique  name  to  every  assertion.  Similar  to 
other  compilers,  the  MODEL  processor  maintains  a  symbol  table  called 
dictionary  which  contains  all  the  symbolic  names  of  program  entities  and 
their  attributes. 

The  dictionary  is  created  by  a  procedure  CRDICT  which  finds  all  the 
entities  in  the  program  specification  and  stores  their  names  into  the 
dictionary.  Except  for  some  special  cases  described  below,  there  is  a 
correspondence  between  each  statement  in  the  specification  and  an  entity 
in  the  dictionary. 

Attributes  of  a  symbol  such  as  the  type  (file,  group,  field,  ..., 
etc),  the  number  of  dimensions,  the  structural  relation  of  it  to  other 
symbols  are  stored  in  the  dictionary  during  the  process  of  precedence 
analysis,  and  later  during  dimension  analysis.  This  information  is  used 
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later  to  determine  the  execution  sequence. 


Various  types  of  relationships  among  program  entities  have  direct 
implication  on  the  execution  sequence  of  their  corresponding  program 
events.  The  precedence  relationships  among  the  program  events  are  found 
based  on  the  analysis  of  the  program  entities.  For  example,  a 
hierarchical  relationship  exists  when  one  data  entity  contains  another, 
such  as  when  a  file  contains  a  record,  a  record  contains  a  field,  ..., 
etc.  A  dependency  relationship  exists  between  a  field  and  an  assertion 
When  the  field  is  either  a  source  variable  of  the  assertion  ox  its 
target  variable.  There  are  also  relationships  between  data  entities  and 
their  associated  control  variables.  The  events  and  their  precedence 
relations  are  represented  by  a  directed  graph  called  an  Array  Graph. 

The  Array  Graph  is  created  by  two  procedures,  ENHRREL  and  ENEXDP. 
The  ENHRREL  routine  analyzes  data  description  statements  and  finds  the 
precedence  relations  caused  by  the  hierarchical  relations  between  data 
entities.  The  ENEXDP  routine  analyzes  assertions  and  finds  the 
precedence  relations  from  the  dependency  relations  among  data  fields  and 
assertions.  It  also  finds  the  precedence  relations  among  data  entities 
and  their  associated  control  variables,  since  the  Array  Graph  contains 
the  complete  precedence  information,  it  is  used  to  check  the 
completeness  and  consistency  of  the  specification  and  to  determine  the 
computation  sequence. 


4.2  REPRESENTATION  OF  PRECEDENCE  RELATIONSHIPS 
4.2.1  DICTIONARY 

Every  program  entity  has  a  full  name  Which  uniquely  identifies  it. 
Most  of  the  entities  have  a  single  component  full  name.  When  two  data 
entities  share  the  same  name,  it  is  necessary  to  qualify  the  name  with 
their  respective  file  names  to  distinguish  them.  Two  data  entities 
within  one  file  are  not  allowed  to  share  the  same  name.  A  file  name  may 
have  at  most  two  instances  denoted  as  NEW  or  OLD  followed  by  an 
identifier.  Thus  a  data  entity  may  have  a  full  name  of  three 
components <  new  or  OLD,  file  name,  and  data  name.  Control  variables 
have  one  component  more  than  the  associated  data  entities,  i.e.,  a 
reserved  key  name.  The  full  name  and  the  attributes  of  each  program 
entity  are  stored  in  the  dictionary. 

In  order  to  use  memory  efficiently,  memory  space  for  the  entries  of 
the  dictionary  are  allocated  dynamically.  Pointers  to  the  dictionary 
entries  are  stored  in  a  vector  DICTPTR  and  the  total  number  of  pointers 
in  the  vector  is  denoted  as  DICTIND.  With  this  arrangement,  we  can 
allocate  memory  piecewise  and  access  the  information  randomly.  Since 
each  program  entity  corresponds  to  a  node  in  the  Array  Graph,  we  will 
call  its  entry  number  in  the  dictionary  node  number.  The  organization 
of  the  dictionary  is  shown  in  Fig.  4.1  and  the  attributes  in  the 
dictionary  sure  listed  in  Table  4.1. 
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Table  4.1  Attributes  in  the  Dictionary 

XDXCT  -  Xs  the  full  name  of  the  entity. 

XNM4ESXZE  -  Xs  the  number  of  characters  in  XDXCT  field. 

XDNXQUE  -  Is  the  smallest  name  by  which  the  entity  can  be  identified 
uniquely.  If  the  file  name  component  of  a  full  name  is  not 
necessary  to  identify  ^he  entity  uniquely,  then  XDNXQUE  is  set 
to  the  name  without  file  name  component;  otherwise,  XDNXQUE  is 
set  to  XDXCT. 

XDXCTVPE  -  Specifies  the  type  of  the  entity.  Following  axe  the  possible 
values  t 

ASTX  -  An  assertion. 

GRP  -  A  group. 

FILE  -  A  file. 

RECD  -  A  record. 

MODL  -  The  specification  name. 

SPCN  -  A  special  name  prefixed  with  a  keyword  such  as  END,  SIZE, 
LEN,  POINTER,  NEXT,  SUBSET,  ENDPXLE,  and  FOUND. 

SSUB  -  User  or  system  declared  subscripts,  including  the 
standard  subscripts;  SUB1,  SUB2,  ...,  SUB10. 

$S  -  System  added  subscripts;  $1,  $2,  ...,  $10. 

SSI  -  System  loop  variables;  SX1,  $12,  ...,  $110. 

XMAINASS  -  Contains  a  pointer  to  the  storage  of  the  statement  which 
defines  the  entity. 
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Table  4.1  Attributes  in  the  Dictionary  (Continued) 

XNHECS  -  This  count  is  meaningful  only  for  file  entities  and  holds  the 
number  of  different  record  types  contained  in  the  file. 

XFARFXLE  -  Bolds  the  node  number  of  the  parent  file  entity  for  all  input 
and  output  data  items. 

XPAREC  -  For  data  items  below  the  record  level  this  field  holds  the  node 
number  of  their  parent  record  entity. 

XINP  -  Is  • l'B  if  the  entity  is  in  input  file,  and  'O'B  otherwise. 

XDUP  -  is  'l'B  if  the  entity  is  in  output  file,  and  'O'B  otherwise. 

XI SAM  -  is  'l'B  if  the  entity  is  an  ISAM  file,  and  'O'B  otherwise. 

XKEYED  -  Is  'l'B  if  the  data  entity  is  in  a  file  for  which  a  key  name 
was  specified. 

XLBN_DAT  -  the  length  in  bytes  of  the  data  entity. 

XREPTNG  -  Is  'l'B  if  the  data  entity  is  repeating. 

XVARYREP  -  Is  'l'B  if  the  data  entity  has  a  varying  number  of 
repetitions . 

XMAX_REP  -  The  maximal  number  of  repetitions  which  was  declared  for  the 
data  entity.  If  no  maximal  repetition  is  declared,  XMAX_REP  is 
set  to  1, 

XVARS  -  Is  'l'B  if  the  entity  contains  a  descendant  below  the  record 
level  and  the  descendant  has  a  variable  structure. 
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Table  4.1  Attributes  in  the  Dictionary  (Continued) 


XSUBREC  -  is  *  1 *B  if  the  data  entity  is  a  member  of  some  record  type. 

XXSSTARRED  -  is  'l'B  if  the  data  entity  is  repeating  and  has  a 
undetermined  repetition. 

XFATHER  -  The  node  number  of  the  data  entity  Which  is  one  level  above 
the  current  entity  in  the  data  structure. 

XS0N1  -  The  node  number  of  the  leftmost  descendant  of  the  current 
entity. 

XBROTHER  -  The  node  number  of  the  immediate  right  neighbor  of  the 
current  entity  in  the  data  structure. 

XENDB  -  The  node  number  of  the  control  variable  END.X  if  the  curmt 
entity  is  X. 

XEXISTB  -  The  node  number  of  the  control  variable  SIZE.X  if  the  current 
entity  is  X. 

XVin_DIM  -  The  conceptual  (virtual)  dimensionality  of  the  entity. 

XSUBSLST  -  A  pointer  to  the  node  subscript  list  associated  with  the 
entity. 

X5SUCCESSORS  -  The  number  of  edges  in  the  XSUCC_LIST. 

XSUCC_LIST  -  a  pointer  to  the  list  of  edges  emanating  from  the  current 
entity. 

XSPREDECESSORS  -  The  number  of  edges  in  the  XPRED_LIST. 

XPRED_LIST  -  A  pointer  to  the  list  of  edges  coming  into  the  current 
entity. 


4.2.2  THE  ARRAY  GRAPH 

The  Array  Graph  is  a  directed  graph  which  represents  the  precedence 
relationships  among  program  events.  The  nodes  in  the  Array  Graph  are 
the  program  events  and  the  edges  are  the  precedence  relationships.  One 
program  event  in  the  Array  Graph  will  correspond  to  one  program  entity. 
Thus  the  nodes  in  the  Array  Graph  correspond  to  the  program  entities  in 
the  dictionary.  The  edges  between  nodes  are  stored  in  edge  lists 
associated  with  those  nodes.  The  attribute  SUCC_LIST  of  a  node  contains 
a  list  of  edges  emanating  from  it  and  the  attribute  PRED_I»IST  contains  a 
list  of  edges  terminating  at  this  node.  We  can  thus  find  the  successors 
as  well  as  the  predecessors  of  any  node. 

I  The  nodes  in  the  Array  Graph  are  compound  nodes.  i.e.,  an  entire 

,  array  of  data  is  represented  by  one  node.  Also  each  assertion  is 

3  represented  by  one  node,  independently  of  how  many  array  elements  it 

defines.  The  range  of  each  dimension  of  a  compound  node  is  stored  in 
the  node  subscript  list  associated  with  the  node.  The  edges  in  the 
Array  Graph  are  compound  edges  which  denote  arrays  of  relations  between 
two  compound  nodes.  With  each  edge  are  also  stored  the  types  of 
subscript  expressions  used  in  the  relations  between  the  source  and  the 
target  node  of  the  edge.  The  meaning  of  the  Array  Graph  is  made  more 
precise  by  considering  the  corresponding  Underlying  Graph  (UG),  where 
every  array  element  is  represented  by  one  node.  An  assertion  node  in 
the  Array  Graph  may  be  expanded  in  the  UG  into  as  many  nodes  as  the 
elements  of  the  array  which  it  defines.  Edges  are  drawn  between  the 
simple  nodes .  The  UG  may  be  an  enormous  graph  which  is  impractical  to 
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analyze.  Sometimes  the  actual  number  of  array  elements  is  not  known 
until  run  time.  Thus  it  is  impossible  to  create  the  UG  of  the 
specification.  In  contrast,  the  Array  Graph  is  more  compact  and  easy  to 
analyze . 


4.2.2.J  DATA  STRUCTURE  OF  EDGES 

Every  edge  from  a  node  S  to  a  node  T  has  a  uniform  format: 

t 

T(U1,  ...,  UJc)  < S(  Jl,  . . .  ,Jm) 

where  t  is  the  type  of  the  edge, 

Jc  is  the  dimensionality  of  node  T, 
m  is  the  dimensionality  of  node  S, 

Ji,  l<=>i<=m,  are  subscript  expressions  appeared  on 
the  ith  dimension  of  node  S. 

Ui,  l<*i<«k,  axe  the  node  subscripts  associated  with 
the  node  T. 

The  subscripts  Ul,  ...,0k  of  the  target  node  T  sure  stored  in  the 
attribute  XSUBSLST  of  T  in  the  dictionary.  Therefore  they  are  not 
specified  in  the  edge.  In  the  later  discussion,  a  type  4  subscript 
expression  Ji  will  be  indicated  by  an  in  the  ith  dimension  of  the 
source  node . 

An  edge  is  represented  by  the  following  data  structure: 

SOURCE  :  The  source  node  of  the  edge. 

TARGET  :  The  target  node  of  the  edge. 

EDGE_TYFE  :  The  type  of  the  edge. 

DIMDIF  :  The  difference  between  the  dimensionality  of  the  target 
node  and  the  source  node. 

SUBX  :  A  pointer  to  the  subscript  expression  list  (Jl,...,Jm). 


4. 2. 2. 2  DATA  STRUCTURE  OF  SUBSCRIPT  EXPRESSION  LIST 

A  subscript  expression  Ji  cam  be  classified  into  one  of  the 
following  seven  categories  according  to  its  composition  (refer  to 
section  3.3.2).  Type  4  subscript  expression  is  referenced  later  as  a 
general  subscript  expression .  Types  5,  6,  and  7  subscript  expressions 
axe  added  for  the  efficient  implementation  of  some  list  type 
functions[PNPR  80],  They  are  basically  of  the  form  X(I)  where  X  is  a 
variable  but  used  to  subscript  another  variable  B  in  B(X(I)).  This  form 
of  subscript  expression  is  referred  to  as  indirect  indexing .  The  array 
used  in  indirect  indexing  must  be  integer  valued  with  non-negative 
entries.  The  system  will  analyze  indirect  subscripts  only  if  the 
indirect  indexing  array  X(I)  is  sublinear.  namely  if  it  is: 

a)  Monotonic,  i.e.,  if  I>J  then  X(I)  >**  X(J). 

b)  Grows  more  slowly  than  I,  i.e.,  X(I)  <»  I. 
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The  system  can  test  the  indirect  indexing  array  automatically  to 
determine  if  it  is  sublinear  by  the  following  simple  criteria.  In  the 
assertion  that  define  the  indirect  indexing  array  X(X),  the  value  of  the 
right  hand  side  must  be  either  0  or  1  for  I-l  and  must  be  equal  to 
X( I-l )  or  x( 1-1 )+l  for  I > 1 .  Thus  the  system  will  examine  the  assertion 
to  check  if  it  is  in  the  formt 
X(I)  -  IP  I-l  THEN  (1  |  0) 

ELSE  (X(I-1)  |  X(I-1)+1)  > 

An  element  in  a  subscript  expression  list  is  defined  by  the 
following  data  structure! 

NXT_SUBL  !  A  pointer  to  the  next  element  of  the  list. 

L0CAL_SUB$  !  If  the  subscript  expression  is  of  the  form  Uq[-c]  or 
X(Uq[-c]  )(-k],  then  L0CAL_SUB$  is  q,  i.e.  the  ordinal  number  of 
the  subscript  Uq  as  it  appears  in  T(  tJk , . . . ,  CJ1 ) . 

APR_M0DE  :  The  type  of  subscript  expression. 

INXVEC  :  The  node  number  of  the  indirect  indexing  vector  X  if  the 
APRJMDDE  is  5 ,  6 ,  or  7 .  Otherwise ,  0 . 


4.3  CREATION  OP  THE  DICTIONARY  (CRDICT) 

The  procedure  CRDICT  analyzes  the  statements  of  the  specification 
and  enters  all  the  program  entities  into  the  dictionary.  To  find  all 
the  data  entities  we  start  from  the  top  level  of  data  structures  and 
then  trace  down  the  structures.  The  structures  whose  root  is  a  file 
listed  in  the  SOURCE  FILE  or  TARGET  PILE  statements  of  the  program 
header  are  considered  external  files,  i.e.  input  file  or  output  file. 
If  a  data  structure  is  not  part  of  any  input  or  output  file,  it  is 
considered  an  interim  variable  which  is  computed  as  any  variable  in  an 
output  file  but  not  written  to  the  external  storage. 

Corresponding  to  each  input  or  output  file,  there  is  a  file  entity 
entered  into  the  dictionary.  If  a  file  named  P  is  served  both  as  a 
source  and  a  target  file,  then  two  file  entities  named  OLD.F  and  NEW.F 
will  be  entered  into  the  dictionary.  Starting  from  the  file  entity  we 
can  find  its  immediate  descendants  from  the  file  description  statement, 
and  the  descendants'  names  will  be  prefixed  by  the  file  entity's  name. 
If  the  root  of  a  data  structure  is  not  a  file,  we  will  consider  INTERIM 
as  its  file  name  and  all  the  decendants  will  be  put  into  dictionary, 
too. 


As  we  analyze  a  data  structure,  we  also  construct  a  tree 
representation  for  it.  Por  every  data  node  we  store  pointers  to  its 
father,  leftmost  son,  and  younger  (i.e.  immediate  to  its  right  side) 
brother  in  the  attributes  XFATHER,  XSON1,  and  XBROTHER  respectively.  Me 
will  illustrate  this  with  an  example  in  Pig.  4.?.. 
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X  IS  GROUP  (Y,Z)  ; 
Y  IS  FIELD  ; 

Z  IS  .FIELD  ; 


X  =  XFATHER(Y) 

X  =  XFATHERC  Z) 

Y  =  XSOKKXJ 
Z  =  XBROTHER(Y) 

Fig.  4.2  Tree  representation  o£  data  structure 


After  all  the  data  entities  axe  entered  into  the  dictionary,  a 
simplified  name  is  derived  for  every  data  entry.  If  the  file  name 
component  can  be  omitted  from  the  full  name  without  causing  any 
ambiguity,  the  simplified  name  is  the  reduced  name.  Otherwise  the 
simplified  name  is  the  same  as  the  full  name. 

Other  types  of  program  entities  such  as  module  name,  assertions, 
and  subscript  variables  are  defined  by  a  specific  type  of  statement 
respectively  and  there  is  a  one-to-one  correspondence  between  the 
statements  and  the  entities.  We  cam  retrieve  these  types  of  statements 
from  the  associative  memory  and  enter  the  entities  into  the  dictionary. 

Finally  we  will  put  control  variables  into  the  dictionary .  For 
each  type  of  qualifier  keyword,  we  find  from  the  program  specification 
all  the  qualified  names  with  that  qualifier.  Next  we  search  the 
dictionary  for  the  suffix  name.  if  the  suffix  is  a  declared  data 
entity,  the  full  name  of  the  control  variable  is  formed  from  the  full 
name  of  the  associated  data  entity.  Otherwise,  the  qualified  name  is  an 
unrecognizable  symbol  and  is  reported  as  such  to  the  user. 


4.4  CREATION  OF  ARRAY  GRAPH 
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4.4.1  ENTER  HIERARCHICAL  RELATIONSHIPS  ( ENHRREL) 


The  data  stored  in  external  sequential  files  are  simply  a  string  of 
bits.  The  use  of  data  description  statements  allows  the  user  to  treat 
them  as  structured.  Therefore,  the  system  has  to  transform  the  data 
files  frosi  a  linear  form  to  the  structured  form  which  is  described  by 
the  user.  For  this  purpose,  we  envisage  that  there  are  two  program 
events  corresponding  to  each  data  entity,  one  for  opening  the  data  and 
the  other  for  closing  the  data.  The  sequential  order  of  data  in  the 
external  file  requires  these  opening  and  closing  events  be  arranged  in  a 
strict  order.  The  precedence  relationship  among  these  program  events 
can  be  established  as  follows.  If  a  data  entity  contains  some  members, 
then  its  opening  event  precedes  the  opening  event  of  its  first  member 
and  its  closing  event  follows  the  closing  event  of  its  last  member,  in 
addition,  the  closing  event  of  its  nth  member  precedes  the  opening  event 
of  its  n+lth  member.  In  the  cause  that  a  data  entity  is  repeating,  then 
the  closing  event  of  its  n-lth  instance  precedes  the  opening  event  of 
its  nth  instance.  Fig.  4.3  shows  the  precedence  relationship  of  a 
sequential  file.  Because  the  data  node  B  is  repeating,  there  is  an  edge 
from  the  n-lth  instance  of  the  closing  event  of  node  B  to  the  nth 
instance  of  the  opening  event  of  node  B.  The  edge  is  shown  as  a  dashed 
line.  The  existence  of  this  feedback  edge  causes  a  cycle  in  the  Array 
Graph  and  this  cycle  ensures  us  that  the  reading  of  an  instance  of  the 
field  D  will  be  followed  by  the  reading  of  an  instance  of  E.  It  should 
be  noted  that  the  subscript  expression  associated  with  the  edge  from  the 
event  C.B  to  the  event  O.B  is  of  the  form  1-1  which  allows  us  to  remove 
it  and  break  the  cycle  during  the  scheduling  phase. 


A  IS  FILE  ( B( *) ,C( *) )  ; 
B  IS  RECORD  (D,E)  ; 

C  IS  RECORD  ( F,G)  ; 
D,E,F,G  ARE  FIELD  ; 


Pig.  4.3  Precedence  relationship  of  a  data  structure 


We  envisage  that  for  each  field  entity  there  is  a  third  node  which 
corresponds  to  the  available  event  of  the  data.  The  opening  event  of  an 
input  field  oust  precede  its  available  event,  and  the  closing  event  of 
an  output  field  should  follow  its  available  event. 

This  view  assures  us  that  we  can  always  read  the  input  files 
sequentially  and  store  then  in  the  main  memory  before  any  confutation 
starts.  If  there  are  variable  structures,  i.e.,  structures  of  varying 
field  length  or  varying  number  of  repetitions,  then  we  may  have  to 
include  some  assertions  in  the  reading  process.  Afterwards  we  can  do 
all  the  computation  internally  conforming  with  the  constraint  of  data 
dependency  which  is  inf  lied  by  the  assertions.  At  the  end,  all  the 
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fields  in  the  output  files  are  available  and  the  informations  for 
controlling  the  variable  structure  are  available,  too.  We  then  take  the 
data  from  main  memory,  assemble  them  into  records,  and  write  the  records 
sequentially. 

Actually  we  have  in  the  Array  Graph  only  one  node,  instead  of  the 
open,  close,  and  available  nodes  mentioned  above,  for  each  data  entity, 
as  this  helps  compiler  efficiency.  For  input  files,  we  can  view  the 
nodes  as  corresponding  to  the  opening  events.  For  output  files,  the 
nodes  corresponding  to  the  closing  events.  The  records  stored  in  a 
sequential  file  have  to  be  accessed  in  a  strict  order.  Therefore,  there 
is  a  precedence  relationships  among  the  data  entities  of  an  input  or 
output  file  to  assure  that  the  records  are  accessed  in  the  proper  order. 
On  the  other  hand,  a  record  is  composed  of  fields.  The  membership 
relation  between  a  record  and  its  constituent  fields  implies  a 
precedence  relationship,  i.e.  no  field  in  an  input  record  will  be 
available  until  the  record  is  read  in.  Similarly  all  the  fields  in  an 
output  record  should  be  available  before  the  record  can  be  written  out. 

We  will  use  the  following  definitions  in  discussing  tree 
structures . 

Definition  For  a  data  entity  G,  S0N1(G)  denotes  its  leftmost  son. 

Definition  For  a  data  entity  G,  RSON(G)  denotes  its  rightmost  son. 

Definition  For  a  data  entity  G,  CEB(G)  denotes  the  closest  elder  brother 
of  G,  i.e.  the  data  entity  Which  is  to  the  immediate  left  of  G 
among  all  the  brothers  of  G. 

Definition  For  a  data  entity  G,  CYB(G)  denotes  its  closest  younger 
brother,  i.e.  the  data  entity  which  is  to  the  immediate  right  of  G 
among  all  the  brothers  of  G. 

Definition  For  any  tree  with  node  G  as  the  root,  RDM(G)  denotes  the 

rightmost  node  on  the  frontier  of  the  tree. 

Definition  For  any  tree  with  node  G  as  the  root,  LDM(G)  denotes  the 

leftmost  node  on  the  frontier  of  the  tree. 


The  precedence  relationships  in  different  file  types  is  discussed 
in  the  following. 

1)  Input  sequential  file.  Since  the  records  in  a  sequential  file  are 
read  in  one  at  a  time,  the  precedence  relationship  needs  to  assure 
that  the  records  are  read  in  the  order  they  are  present  in  the  input 
file.  A  record  may  be  composed  of  many  fields.  Therefore,  after  a 
record  is  read,  it  should  be  unpacked  to  get  all  the  fields.  If  the 
records  in  a  file  axe  not  unpacked  in  the  order  they  are  read,  then 
we  will  need  memory  space  to  store  the  records.  Therefore,  it  is 
advantageous  to  unpack  the  records  when  they  are  read  in.  This 
implies  that  all  the  fields  in  a  sequential  file  will  become 
available  in  the  order  they  occur  in  the  external  file.  Three  kind 
of  edges  are  drawn  among  the  data  nodes  in  an  input  sequential  file. 
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a)  Assume  that  a  data  node  G  is  n  dimensional,  if  S0Nl(G)  exists  and 
is  m  dimensional  where  m  may  be  either  n  or  n+1,  then  the 
following  edge  is  drawn. 

S0N1(G)(  Jl, . . . ,  Jm)  <-la-  G<  Jl, . . . ,  Jn) 

b)  Assume  that  a  data  node  G  is  n  dimensional  and  FATHER^  G )  is  k 
dimensional  where  k  may  be  either  n-1  or  n  depending  on  Whether 
node  G  repeats  or  not.  If  CEB(G)  exists  and  RDM(CEB(G))  is  m 
dimensional,  then  the  following  edge  is  drawn. 

G( Jl, . . . , Jn)  <-lb-  ROM(CEB( G ) )( Jl, . . . , Jk, *,...,*) 

c)  Assuming  that  a  data  node  G  is  n  dimensional.  If  it  is  repeating, 
then  the  following  edge  is  drawn. 

G( Jl, . . . , J  )  <-lc-  RDM(G)( Jl, . . ,,J  -1, 
n  n 

If  a  data  node  in  an  input  sequential  file  corresponds  to  the 
opening  event  of  that  data,  we  can  interpret  the  above  edges  in  the 
following  way.  The  edges  of  type  la  say  that  a  higher  level  data 
instance  should  be  ready  before  all  of  the  data  instances 
corresponding  to  the  first  member  of  it  can  be  read.  The  edges  of 
type  lb  say  that  all  the  brothers  within  the  same  instance  of  their 
father  should  be  read  in  the  order  they  are  declared  in  the  data 
structure.  The  edges  of  type  lc  say  that  if  a  data  node  is 
repeating,  then  one  instance  of  it  is  not  ready  to  be  read  until  the 
last  field  in  the  previous  instance  of  it  is  read. 

2)  Output  sequential  file.  The  records  of  an  output  sequential  file 
should  be  written  out  in  a  strict  order.  There  may  be  several  fields 
in  a  record,  therefore,  we  may  have  to  pack  the  fields  before 
writing.  Packing  the  fields  when  they  become  available  is  convenient 
for  the  code  generation  but  poses  extra  restrictions  on  scheduling 
the  assertions .  For  example,  suppose  a  record  node  R  contains  three 
fields  A,  B,  and  C.  If  we  insist  that  fields  A,  B,  and  C  should  be 
available  in  that  order,  the  user  would  not  be  able  to  define  the 
value  of  A  in  terms  of  C.  Therefore,  at  or  above  the  record  level 
the  precedence  relationship  requires  that  the  records  be  written  in 
strict  order  but  below  record  level  the  precedence  relationship  will 
only  require  that  the  constituent  fields  of  a  record  are  ready  before 
the  record  is  written.  Therefore,  fields  in  a  record  do  not  have  to 
be  computed  in  the  order  they  axe  packed  into  the  record. 

Three  kinds  of  edges  are  drawn  among  the  data  entities  above  and 
including  the  record  level  of  an  output  sequential  file. 

a)  Assuming  that  G  is  an  n  dimensional  data  entity  above  the  record 
level  and  RSON(G)  ,  i.e.  the  rightmost  son  of  G,  is  a 
dimensional.  The  following  edge  is  drawn  from  RSON(G)  to  G. 

G{ Jl , . . , , Jn )  <-2a-  RSON(G)( Jl, . . . , Jn, * ) 

b)  If  node  G  has  a  younger  brother,  then  an  edge  will  be  drawn  from 
node  G  to  LDM( CYB( G ) ) .  1st  G  be  am  n  dimensional  node,  FATHER (G) 
be  a  k  dimensional  node,  and  LDM( CYB(G) )  be  a  m  dimensional  node. 
The  edge  to  be  drawn  is  as  follows. 

LOM( CYB( G ) )( Jl , . . . , Jk, . . ., Jta )  <— 2b—  G( Jl, . . . , Jk, *) 

c)  If  node  G  is  repeating,  then  the  following  edge  is  drawn  from  G  to 
LDM( G) .  Let  G  be  an  n  dimensional  node  and  LDK(G)  be  a  a 
dimensional  node. 
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LDM( G)( Jl, . . . , Jn, . . . Jm)  <-2c—  G< Jl, . . . , Jn-1) 


If  we  imagine  that  a  data  node  in  an  output  sequential  file 
corresponds  to  the  closing  event  of  that  data,  then  the  edges 
asntioned  above  have  the  following  interpretation.  An  edge  of  type 
2a  says  that  a  data  instance  can  be  written  out  only  after  all  the 
data  instances  corresponding  to  its  last  son  are  written  out.  An 
edge  of  type  2b  says  that  all  the  instances  of  an  elder  brother 
within  the  same  father  instance  should  be  written  before  any  instance 
of  its  younger  brother  can  be  written.  An  edge  of  type  2c  says  that 
if  a  data  node  is  repeating,  then  an  instance  of  it  cannot  begin  to 
be  written  until  the  previous  instance  is  completely  written. 

Below  the  record  level  in  an  output  file,  the  precedence 
relationships  assures  that  a  record  will  not  be  written  out  until  all 
of  its  constituent  fields  are  available.  However,  the  relative  order 
in  which  the  fields  are  computed  is  not  restricted.  Me  will  simply 
draw  edges  from  all  the  descendants  of  a  record  node  to  it.  Fig.  4.4 
illustrate  the  edges  in  an  output  sequential  file. 


A  IS  FILE  ( B( *) ,C( *) )  ; 
9  IS  RECORD  (D,E)  ; 

C  IS  RECORD  (F,G)  J 
r,E,F,G  ARE  FIELD  ; 


Fig.  4.4  The  edges  in  an  output  sequential  file 
3)  An  input  ISAM  file.  Xn  an  ISAM  file,  there  is  only  one  type  of 
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record.  The  dimensionality  of  the  record  node  IR  is  the  same  as  that 
of  the  associated  control  variable  POINTER. IR.  Since  the  record 
instances  are  accessed  with  the  keys,  it  is  possible  to  read  the 
records  in  the  order  of  the  keys.  If  the  ISAM  file  is  a  pure  source 
file  to  the  program,  the  keys  in  the  POINTER.  IR  array  can  be  used  in 
any  order.  On  the  other  hand,  if  the  ISAM  file  is  used  as  a  source 
and  target  file,  the  records  should  be  processed  in  a  sequential  way, 
therefore,  the  keys  in  the  POINTER  array  should  be  used  sequentially 
to  access  the  records.  Below  the  record  level,  we  can  have  the 
similar  precedence  relationship  as  in  a  SAM  file  because  we  may  have 
to  unpack  the  fields. 

4)  An  output  ISAM  file.  If  an  ISAM  file  is  a  pure  target  file,  the 
output  records  will  be  added  to  the  file.  If  it  is  a  source  and 
target  file  to  the  program,  then  only  the  selected  records  may  be 
updated.  In  order  to  assure  that  each  updated  record  includes  the 
effects  of  previous  updates,  we  will  have  to  update  and  write  out  a 
record  before  the  next  record  is  read  in.  Therefore,  the  keys  in  the 
POINTER  array  should  be  used  sequentially.  However  the  fields  in  an 
output  record  can  be  computed  in  any  order.  Below  record  level  the 
precedence  relationships  only  reflect  the  membership  of  the  fields 
within  the  record. 

/ 

5)  Interim  variable.  There  are  no  I/O  actions  concerning  interim 
variables.  They  are  stored  in  main  memory  and  referenced  as  fields. 
Therefore,  there  is  no  relative  precedence  relationship  among  the 
interim  fields.  But  we  still  draw  edges  which  reflect  the  membership 
among  the  data  entities  to  facilitate  range  propagation  (refer  to 
Chapter  5 ) .  Since  an  interim  variable  is  considered  to  be  part  of  an 
output  file  except  that  it  will  not  be  written  out,  the  edges  are 
drawn  from  the  descendants  to  the  ancestors. 


4.4.2  ENTER  DEPENDENCY  RELATIONSHIPS  (ENEXDP) 

Two  types  of  assertions,  namely  simple  assertion  and  conditional 
assertion,  may  be  used  to  define  the  values  of  interim  variables  and 
output  variables.  The  execution  of  an  assertion  depends  on  the 
availability  of  all  of  its  source  variables,  and  its  execution  makes  the 
target  variable  available.  This  is  because  a  data  entity  must  be 
defined  before  it  is  referenced  and  a  data  entity  becomes  available 
after  the  assertion  in  Which  it  is  the  target  variable  is  executed. 

Procedure  ENEXDP  examines  all  the  assertions  twice.  In  the  first 
pass,  it  checks  Whether  the  target  variable  of  an  assertion  defines  a 
subline ar  function  and  can  be  used  as  an  indirect  indexing  vector  or 
not.  An  indirect  indexing  array  should  be  defined  by  an  assertion  of 
the  following  form. 

X(I)  -  IP  1-1  THEN  (0  |  1) 

ELSE  (X(I-1)  |  X(I-1)+1)  ; 

During  the  second  pass,  it  analyzes  every  assertion  and  enters  the 
precedence  relations  caused  by  explicit  data  dependency  into  the  Array 
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Graph.  Givan  a  simple  assertion,  the  left  hand  side  of  it  is  scanned  to 
find  the  target  variable.  Then  the  expression  on  the  right  hand  side  is 
scanned  to  find  all  the  source  variables.  For  a  conditional  assertion, 
the  THEM  parts,  ELSE  parts,  and  the  conditional  expression  parts  are 
scanned  in  that  order  to  find  all  the  source  and  the  target  variables. 
The  source  variables  in  a  conditional  assertion  are  found  in  the 
conditional  expressions,  the  THEM  parts,  and  the  ELSE  parts.  For  every 
source  variable  an  edge  is  drawn  from  it  to  the  assertion  node,  it 
should  be  noted  that  one  assertion  defines  one  target  variable  only  and 
no  more  than  one  target  variable  cam  appear  in  a  conditional  assertion. 

Hie  edge  from  the  source  variable  to  the  assertion  is  of  EDGE_T*PE 
3  and  the  edge  from  the  assertion  to  the  target  variable  is  of  EDGELTYPE 
7.  The  DIMDIF  is  the  dimensionality  difference  of  the  target  node  and 
the  source  node  of  the  edge.  The  types  of  the  subscript  expressions  of 
a  source  variable  are  stored  in  the  subscript  expression  list  associated 
with  the  edge.  It  should  be  noted  that  the  subscript  expressions  of  the 
target  variable  define  a  mapping  from  the  node  subscripts  of  the  target 
variable  to  the  node  subscripts  of  the  assertion.  Because  the  edge 
corresponding  to  the  occurrence  of  the  target  variable  is  drawn  from  the 
assertion  node  to  the  target  variable,  instead  of  from  the  target 
variable  to  the  assertion  node,  the  mapping  should  be  inverted  to  form 
the  subscript  expression  list  of  the  edge.  In  Fig.  4.5  the  data 
dependency  of  an  assertion  is  shown.  Notice  that  there  is  a  list  of 
subscripts  associated  with  every  node  in  the  graph.  For  example, 
variable  A  is  a  two  dimensional  array.  Subscripts  <A,1>  and  <A,2> 
correspond  to  the  first  and  second  dimension  of  array  A.  The  edge 
leading  from  node  A  to  al  has  a  subscript  expression  list  associated 
with  it.  The  subscript  expressions  are  ordered  in  the  way  they  are  used 
in  the  subscript  variable  A(I,J-1). 
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al:  C(I,J)  =  ACJ.J-l)  +  BCI,4)  ; 


Fig.  4.5  The  data  dependency  of  an  assertion 


In  addition  to  the  explicit  data  dependency  found  in  an  assertion, 
there  exists  some  implicit  data  dependency  between  the  data  entities  and 
their  associated  control  variables,  bet  TROT  denote  the  name  of  a  data 
entity  and  NODE  denote  the  name  of  the  associated  control  variable  Which 
is  composed  of  a  keyword  PREFIX  followed  by  the  name  of  the  data  entity. 

1.  If  PREFIX  •  'POINTER',  then  verify  that  TROT  is  a  keyed  record  and 
draw  an  edge. 

TROT  <— 5—  POINTER. TROT,  DIMDIF  m  0  . 

2.  If  PREFIX  -  'SIZE',  then  verify  that  TROT  is  repeating  and  draw  an 
edge. 

TROT(I)  <-13-  SIZE. TROT,  DIMDIF  -  1  . 

3.  If  PREFIX  -  'END',  then  verify  that  TROT  is  repeating  and  draw  an 
edge. 

TROT(I)  <-14-  END . TRCT( I— 1 ) ,  DIMDIF  «  0  . 

4.  If  PREFIX  «  'FOUND' ,  then  verify  that  TROT  is  a  keyed  record  and 
draw  an  edge. 

FOUND. TROT  <-15-  TROT,  DIMDIF  -  0  . 

5.  If  PREFIX  ”  'NEXT',  then  verify  that  TROT  is  a  field  in  an  input 
sequential  file  and  draw  an  edge. 

NEXT. TROT  <-16-  TROT,  DIMDIF  -  0  . 

6.  If  PREFIX  •  'SUBSET',  then  verify  that  TROT  is  an  output  record. 
If  it  is  an  output  record,  then  draw  the  following  edge. 

TROT  <-17-  SUBSET. TROT,  DIMDIF  -  0  . 

7.  if  PREFIX  -  'LEN',  then  we  draw  an  edge. 

TROT  <-20-  DEN. TROT,  DIMDIF  -  0  . 
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The  subscript  expression  lists  of  these  edges  are  for  the  moment 
empty.  They  will  be  constructed  by  the  procedure  PILLS  UB  later 
according  to  the  EDGE_TYPE. 


4.5  FINDING  IMPLICIT  PREDECESSORS  (ENIMDP) 

Many  efforts  have  been  made  to  make  MODEL  language  tolerate  some 
incompletenesses  and  inconsistencies  in  the  specification.  When 
incompletenesses  and  inconsistencies  are  found,  warning  messages  or 
error  messages  are  sent  to  the  user.  If  practical,  the  MODEL  processor 
tries  to  correct  the  specification  in  a  reasonable  way. 

If  an  interim  field  is  not  defined  by  any  assertion,  an  error 
message  is  sent  to  inform  the  user.  It  is  probable  that  the  user  forgot 
to  write  the  assertion.  Therefore,  the  system  should  request  an 
assertion  from  the  user.  However,  if  a  field  in  a  target  file  is  not 
defined  explicitly,  the  MODEL  processor  will  try  to  find  an  implicit 
source  to  define  that  field.  The  MODEL  processor  tolerates  this  kind  of 
incompleteness  and  saves  the  user  work  of  writing  assertions  for  merely 
copying  fields  from  a  source  file  to  a  target  file. 

Given  a  field  in  a  target  file  which  is  not  explicitly  defined  by 
any  assertion,  we  will  search  for  a  field  with  the  same  name  in  another 
file  according  to  the  following  order  of  priority.  The  idea  is  to  make 
some  reasonable  assumption  so  that  the  undefined  field  will  get  a  value. 
Rule  1}  If  the  undefined  field  is  in  a  file  which  is  both  a  source  and 
target  file,  then  the  value  in  the  corresponding  field  in  the 
old  record  is  taken  as  the  value  for  it. 

Rule  2s  If  Rule  1  does  not  apply,  then  the  processor  tries  to  find  a 
same-named  field  in  other  source  files.  If  one  is  found,  it  is 
assumed  to  be  the  source.  If  more  than  one  is  found,  then  the 
processor  arbitrarily  picks  one  as  the  source  and  prints  a 
message  to  indicate  that  there  was  ambiguity. 

Rule  3:  If  the  above  aura  unsuccessful,  the  processor  tries  to  find  a 
field  with  the  same  name  in  other  output  files.  If  one  is 
found,  it  is  taken  as  the  source,  and  if  more  than  one  is  found, 
then  one  is  taken  arbitrarily,  with  a  corresponding  message  to 
the  user  regarding  the  ambiguity. 

In  the  above  cases  where  an  implicit  predecessor  is  found 
successfully,  an  assertion  Which  defines  the  target  variable  by  the 
implicit  predecessor  is  generated  aa  if  it  were  entered  by  the  user. 


4.6  DIMENSION  PROPAGATION  (DIMPROP) 

The  source  amd  the  target  variables  in  an  assertion  may  be  arrays. 
In  order  to  reference  am  element  of  an  N  dimensional  array,  the  user 
should  subscript  the  array  name  with  N  subscript  expressions.  A 
subscript  less  dialect  of  the  MODEL  language  allows  the  user  to  omit 


-  84  - 


subscripts  in  assertions  in  certain  cases  which  do  not  lead  to 
ambiguity,  Therefore,  the  number  of  subscript  expressions  following  an 
array  variable  does  not  necessarily  indicate  its  actual  dimensionality, 
Furthermore,  the  declaration  of  a  multi-dimensional  interim  array  may  be 
simplified  by  omitting  the  data  description  statements  for  the  higher 
level  groups.  The  omission  of  subscript  expressions  in  assertions  and 
the  omission  of  the  higher  level  data  description  can  be  viewed  as 
incompleteness  or  inconsistency  of  the  specification.  However,  they  are 
tolerated  by  the  MODEL  processor,  and  a  process  called  dimension 
propagation  is  used  to  resolve  inconsistencies  of  the  dimensionality  for 
the  interim  variables  and  missing  subscripts  in  assertions. 

All  the  nodes  in  input  and  output  files  should  be  declared 
precisely,  using  data  description  statements .  Their  number  of 
dimensions  can  therefore  be  derived  directly  from  the  data  description 
statements.  Associated  with  every  edge  there  is  a  field  DIMDIF  which 
denotes  the  dimension  difference  between  the  source  and  the  target  nodes 
of  the  edge.  The  number  of  dimensions  of  a  node  can  be  propagated  along 
the  edges  of  the  Array  Graph. 

The  dimension  propagation  algorithm  is  briefly  described  in  the 
following.  Let  N  denote  the  set  of  nodes  in  the  Array  Graph,  array  C 
store  the  current  number  of  dimensions,  and  array  D  store  the  initially 
declared  number  of  dimensions  for  each  node  in  N.  A  queue  Q  keeps  all 
the  nodes  whose  calculated  dimension  could  possibly  be  changed . 

Algorithm  4.1  Dimension  Propagation 
Input.  Array  Graph. 

Output.  VTR_DIM s  An  attribute  in  the  dictionary  which  contains  the 
number  of  dimensions  of  a  node. 

1.  For  each  node  n  in  N,  let  C(n)  be  D(n)  and  put  node  n  in  Q. 

2.  If  Q  is  empty,  then  exit. 

3.  Pick  a  node  n  from  Q,  remove  it  from  Q.  Let  dim  be  o. 

4.  For  every  incoming  edge  from  node  s  to  n,  let  dim  be  the  maximum  of 
dim  and  c( s )+DiMDiF . 

5.  For  every  outgoing  edge  from  node  n  to  t,  let  dim  be  the  maximum  of 
dim  and  C(t)-DIMDIF. 

6.  If  dim<*C(n),  go  to  step  2. 

7.  Else,  the  node  n  has  a  new  updated  dimension.  Let  C(n)  be  dim. 

8.  For  every  incoming  edge  from  node  s  to  n,  append  s  to  Q. 

9.  For  every  outgoing  edge  from  node  n  to  t,  append  t  to  Q. 

10.  If  more  than  N*N  nodes  have  been  taken  from  the  queue,  then  halt  and 
issue  an  error  message  -  there  exists  a  propagation  cycle. 

If  the  process  converges,  then  every  node  will  have  a  finite 
dimension.  However,  it  is  possible  that  a  cycle  in  the  graph  causes  an 
endless  increase  in  the  dimensions.  Consider  for  example  the  following 
specification. 

(F,  B)  ARE  FIELD  > 

I  IS  SUBSCRIPT  l 

IF  1*1  THEN  H(I)  -  5  ;  ELSE  H(  I )  —  F+l  ; 

IF  1-1  THEN  F( I )  —  6  j  ELSE  F(I)  -  H+l  > 
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The  first  assertion  implies  that  the  dimension  of  H  is  larger  by  l 
them  that  of  F,  i.e.  C(H)>C(F).  The  second  assertion  states  that 
C(F)»C(H).  Applying  our  algorithm  to  this  specification  will  result  in 
endless  loop  of  alternately  incrementing  C(H)  and  C(F).  In  this  case 
the  system  will  send  out  an  error  message  indicating  that  the  dimension 
propagation  process  is  in  an  infinite  cycle  and  also  print  out  the  nodes 
involved  in  the  cycle. 


4.7  FILLING  MISSING  SUBSCRIPTS  IN  ASSERTIONS  (FILLSUB) 

In  the  dimension  propagation  phase  we  have  determined  the  number  of 
dimensions  of  every  node.  If  the  number  of  dimensions  of  a  node  is 
larger  than  its  apparent  number  of  dimensions,  it  is  necessary  to  add 
the  respective  subscript  and  data  structures.  This  is  performed  in  the 
following  three  tasks. 

Task  1:  Generate  the  node  subscript  list. 

If  the  node  X  is  a  data  node,  its  node  subscript  list  is  (displayed 
here  from  last  to  first): 

( FOIL.EACH.Ak,  ....  ,  F0R_EACH.A1) 

whore  Ak,  . . . ,  A1  is  the  list  of  the  repeating  ancestors  of  X  in  a  top 
down  order.  If  X  itself  is  repeating  than  A1  is  equal  to  X. 

If  the  node  is  an  assertion  node,  then  it  has  already  been  assigned 
a  partial  subscript  list  by  ENEXDP.  This  is  the  list  of  apparent 
subscripts  in  the  assertion,  i.e.  all  the  subscripts  appearing  either 
on  the  L.H.S.  or  the  R.H.S.  of  the  assertion.  Let  the  assertion  be  of 
the  form: 

al:  A(Ik,  . . .,  II)  -  f( _ )  ; 

Let  the  R.H.S .  contains  the  subscripts  Jl,  ...,  Jtn  not  appearing  on  the 
L.H.S.  and  hence  assumed  to  be  reduced.  Then  the  partial  list  assigned 
to  al  is  (Ik,  ...,  II, Ota,  . ,.,J1)  and  its  apparent  dimensionality  is 
determined  to  be  d-k+ra .  As  a  result  of  the  dimension  propagation 
process  we  may  have  recomputed  a  new  dimensionality  c  for  al  where  c>— d . 
This  will  cause  n«c-d  new  subscripts  to  be  added  to  the  subscript  list 
of  al  which  now  appears  as: 

(  9n,  ...,  $1,  Ik, . . .  II ,  Jtn, . « . . ,  Jl ) 
where  SI,  ...,  9n  are  the  name  of  the  new  subscripts. 

Task  2:  Fill  in  Missing  Subscripts  in  the  Assertions. 

Consider  an  instance  of  a  subscripted  variable  A( Ij,  . . . ,  II )  in  an 
assertion.  The  calculated  dimension  VIR_DIM  for  array  A  yields  a  value 
d  which  should  be  greater  or  equal  to  j .  if  n— d-j  >0  we  should  add  n  new 
system  generated  subscripts  $1  to  9n,  modifying  the  instance  into  A($n, 
. ..,  91, Ij,  ...,  II).  It  should  be  noted  that  the  new  subscripts  are 
always  added  on  the  leftmost  dimensions  of  the  array  variables. 

Task  2:  Fill  in  the  Subscript  Expression  List  for  the  Edges. 
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All  the  edges  except  types  3  and  7  have  been  generated  with  an 
empty  subscript  expression  list.  Using  the  edge  type  and  the  dimensions 
of  its  source  and  target  nodes,  we  generate  a  subscript  expression  list 
for  each  edge.  Edges  of  type  3  and  7  have  a  partial  subscript 
expression  list  based  on  their  apparent  appearance  in  the  assertion,  it 
may  be  necessary  to  expand  this  partial  list.  If  n  missing  subscripts 
have  been  added  to  the  variables  in  am  assertion,  then  it  is  necessary 
to  add  n  subscript  expressions  to  the  edges  which  correspond  to  the 
instances  of  the  variables  in  the  assertion. 
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CHAPTER  5 
RANGE  PROPAGATION 


5.1  INTRODUCTION 

The  structures  of  variables  are  declared  in  data  description 
statements.  Every  variable  is  considered  an  array  of  some  dimensions. 
The  number  of  elements  in  an  array  variable  is  determined  by  the 
dimensionality  of  the  array  and  the  sizes  of  each  of  the  array 
dimensions.  The  size  of  an  array  dimension  is  called  the  range  of  that 
dimension .  The  range  information  allows  us  to  allocate  memory  space  for 
the  array  variables  and  generate  iteration  control  statements  which  will 
define  every  element  in  the  arrays.  The  use  of  subscripts  in  assertions 
makes  it  possible  to  define  multiple  elements  of  an  array  through  one 
assertion.  We  can  instantiate  an  assertion  by  fixing  its  subscript 
values.  Then  every  instance  of  the  assertion  defines  one  single  data 
element.  The  ranges  of  the  assertion's  subscripts  restrict  the  number 
of  instances  of  an  assertion,  which  in  turn  defines  the  number  of  times 
that  the  assertion  will  be  executed.  The  ranges  of  array  dimensions  and 
assertion  subscripts  are  used  in  the  later  phases  to  synthesize  the 
program. 

Much  information  is  not  given  explicitly  in  the  specification.  For 
instance  users  are  allowed  in  assertions  to  use  free  subscripts  for 
which  the  range  is  not  specified.  Also  the  range  specifications  of  some 
array  dimensions  may  be  omitted.  Therefore  an  algorithm  is  needed  to 
derive  ranges  for  certain  assertion  subscripts  and  array  dimensions. 

There  is  yet  another  reason  why  we  want  to  analyze  the  subscript 
ranges .  a  criterion  for  placing  a  number  of  assertions  in  the  scope  of 
one  loop  is  that  they  all  have  subscripts  of  the  same  range.  From  the 
point  of  view  of  program  optimization  it  is  preferred  to  have  the  loop 
scope  as  large  as  possible.  It  is  important  therefore  to  identify  the 
subscripts  of  the  same  range.  By  propagating  the  specified  range 
information  to  all  the  assertion  subscripts  and  array  dimensions  we  not 
only  find  the  ranges  which  have  been  incompletely  specified,  but  also 
identify  the  ranges  Which  are  equal. 
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5.2  LANGUAGE  CONSTRUCTS  FOR  RANGE  SPECIFICATION 

A  multi-dimensional  array  is  declared  as  a  hierarchical  data 
structure  with  the  most  significant  dimension  specified  at  the  top 
level.  The  range  of  a  dimension  may  not  depend  on  the  subscript  value 
of  less  significant  dimension.  The  range  of  an  array  dimension  may  be 
specified  in  MODEL  in  several  alternate  ways  as  follows* 

(1)  Through  a  data  description  statement.  A  constant  number  of 
repetitions  of  a  data  structure  may  be  specified  in  the  data 
description  statement  which  describes  the  parent  structure. 

(2)  By  defining  the  value  of  a  SIZE  qualified  control  variable  (Refer  to 
section  3.4. ).  For  example,  if  group  X  repeats  M  times  and  M  is  a 
variable  itself,  we  may  use  the  following  assertion  to  specify  its 
range* 

SIZE.X  -  M  ; 

A  SIZE  qualified  variable  is  an  interim  variable  of  at  most  one 
dimension  less  than  that  of  the  suffix  variable.  Its  value  is  used 
to  define  the  range  of  the  last  dimension  of  the  suffix  variable 
(i.e.  X).  Consider  an  N  dimensional  repeating  group  X.  Assume 

that  the  ranges  of  all  its  dimensions  except  the  least  significant 
one  are  defined  elsewhere.  By  definition,  SIZE.X  is  at  most  an  N-l 
dimensional  array  and  the  range  of  its  dimensions  is  exactly  the 
same  as  the  range  of  corresponding  dimensions  of  data  structure  X. 
Since  the  values  in  array  SIZE.X  can  be  different  from  one  another, 
the  array  X  may  not  have  a  regular  (i.e.  rectangular)  shape,  but 
have  "jagged  edges."  This  can  be  stated  formally  as  follows: 


X(S  ,s  ,...,S  ,...,S  )  is  in  X  iff 
12  A  n 

SIZE.X( S  ,...,S  )  is  in  SIZE.X  S 
1  A 

1  <-  S  <«  SIZE ,X( S  ,...,S  ) 
n  1  A 

(3)  By  defining  the  value  of  an  END  qualified  control  variable.  The  END 
array  is  of  boolean  type.  it  determines  the  range  of  the  least 
significant  dimension  of  the  variable  named  in  the  suffix.  Given  an 
N  dimensional  array  X,  the  associated  control  array  END.X  has  the 
same  structure  as  array  X.  The  range  of  the  Nth  dimension  is 
defined  as  the  smallest  positive  integer  Ln  Which  satisfies  the 
following  conditions. 

END.X( S  ,...,S  ,Ln )  -  TRUE  S 
1  n-l 

END.X( S  ,...,S  ,S  )-  FALSE, 

1  n-l  n 

for  1  <-  S  <  Ln. 
n 
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(4)  By  using  a  subscript  declaration  statement  to  define  a  global 
subscript.  The  constant  number  of  repetition  can  be  specified  in 
the  statement.  For  example > 

X  IS  SUBSCRIPT  (20)  > 

(5)  By  system  default.  A  repeating  data  structure  which  is  a  rightmost 
decendant  and  Which  is  above  or  at  the  record  level,  may  be  assigned 
the  end-of-file  as  its  range  if  the  user  does  not  specify  a  range 
for  it. 

The  mechanisms  of  SIZE  and  END  arrays  acre  not  totally  redundant. 
There  are  some  essential  differences  between  the  SIZE  and  END  arrays. 
First,  the  END  array  can  define  a  minimum  range  of  one,  whereas  the  SIZE 
can  define  a  range  of  zero.  This  is  because  the  END  array  must  have  at 
least  one  value  of  boolean  true.  Secondly,  the  range  specified  by  SIZE 
array  is  finite.  But  the  range  specified  by  END  array  may  be  infinite 
(through  a  user  error  in  the  range  defining  assertion,  when  there  is  no 
first  boolean  true  condition).  This  is  not  checked  by  the  system. 
Thirdly,  the  range  specified  by  array  SIZE.X(  II, . .,Ik)  may  not  depend  on 
the  array  element  X(I1, ..,In),  while  END. X( II, ..., In)  may  depend  on 
X( II, . . . , In) .  For  example,  let  X( 1), . . . ,X(k)  be  all  the  instances  of  an 
one  dimensional  array  X  whose  range  is  specified  by  SIZE ,X»k.  in  the 
program,  the  value  of  SIZE.X,  i.e.  k,  must  be  computed  before  we 
compute  any  of  the  elements  of  X.  If  END  control  array  is  used,  the 
range  is  specified  by  END.X(1),  ...  ,  END.X(k),  and  we  only  have  to 
ensure  that  END.X(I-1)  is  computed  before  X(X)  for  l<I<*k. 


5.3  DEFINITIONS 

Subscript  variables  belong  to  a  special  class  of  variables.  While 
an  ordinary  variable  can  assume  only  a  unique  value,  a  subscript 
variable  can  take  on  a  range  of  positive  integer  values.  Subscript 
variables  can  be  used  as  indices  in  array  element  references  or  in  the 
same  way  as  ordinary  variables  to  compose  complicated  expressions.  The 
meaning  of  subscripts  is  the  same  as  their  meaning  in  mathematical 
usage. 

The  following  definitions  axe  used  in  discussing  subscripts. 

Definition  Let  X  be  an  N  dimensional  array  represented  in  the  Array 
Graph  by  a  node.  Let  i  be  a  positive  integer.  The  tuple  <X, i>  is 
referred  to  as  a  node  subscript .  It  denotes  the  ith  dimension  of 
the  node  of  array  X.  Let  al  be  an  assertion  node,  and  I  a 
subscript  variable  referenced  in  the  assertion  al.  The  tuple 
<al,X>  is  referred  to  as  a  node  subscript  for  I  associated  with  the 
assertion  node  ail.  If  <n,d>  is  a  node  subscript,  then  R(<n,d>) 
denotes  its  range. 


Node  subscripts  are  grouped  into  ranee  sets.  Every  range  set 
contains  the  node  subscripts  which  have  the  same  range.  However  no  two 
distensions  of  the  same  node  can  be  put  into  one  range  set  even  if  they 
have  the  same  ranges  because  every  range  set  will  later  correspond  to  a 


level  of  nested  loops  In  the  generated  program  and  no  two  dimensions  of 
the  same  node  can  correspond  to  the  same  level  of  nesting  loops. 


Definition  The  range  of  a  subscript  that  has  been  declared  as  a  global 
subscript  is  the  same  in  all  assertions  Where  it  Is  used.  There 
can  only  be  one  range  associated  with  a  global  subscript. 

Definition  The  range  of  a  subscript  that  has  not  been  declared  as  global 
is  fixed  within  the  scope  of  the  assertion  where  it  is  used.  It 
will  be  called  a  local  subscript .  A  symbol  used  as  a  local 
subscript  can  have  different  ranges  in  different  assertions. 

There  are  two  types  of  global  subscripts  in  MODE!..  One  is 
specified  by  use  of  the  qualifying  keyword  POR_EACH  in  the  prefix  and  a 
repeating  data  structure  name  in  the  suffix.  The  other  is  explicitly 
declared  in  a  subscript  declaration  statement.  (Refer  to  section 
3.3.2.)  Hie  for_each  type  global  subscript  always  has  the  range  of  the 
repeating  data  group  named  in  the  suffix  associated  with  it.  A  user 
declared  global  subscript  can  have  its  range  specified  in  the  subscript 
declaration  statement.  By  using  global  subscripts  in  assertions,  the 
user  can  specify  explicitly  the  range  of  assertion  subscripts. 

local  subscripts  are  all  of  the  form  SUBn  where  n  is  a  positive 
integer.  Users  do  not  have  to  declare  local  subscripts  (in  subscript 
statement).  The  use  of  local  subscripts  in  an  assertion  is  like  that  of 
formal  parameters  in  a  function  definition.  They  can  be  chosen 
arbitrarily  within  the  scope  of  an  assertion.  This  gives  the  user 
freedom  to  reuse  the  subscript  names  in  different  assertions. 


5.4  DISCUSSION  OF  RANGE  PROPAGATION 
5.4.1  CRITERIA  FOR  RANGE  PROPAGATION 

In  this  section  we  discuss  the  conditions  for  propagating  the  range 
of  a  subscript  from  one  node  to  another.  A  node  subscript  refers  to 
either  an  array  dimension  or  an  assertion  subscript.  If  two  node 
subscripts  are  related  through  some  dependency  relation  and  one  of  them 
does  not  have  an  explicit  range  specification,  we  propagate  the  range 
from  one  to  the  other. 

Let  us  consider  first  a  simple  assertion  >  B( I )  -  A( I )  .  Three 
entities  are  involved  <  the  source  variable  A,  the  target  variable  B, 
and  the  assertion  itself.  All  of  them  are  one  dimensional  objects.  The 
assertion  states  that  the  kth  instance  of  the  assertion  corresponds  to 
the  kth  instance  of  array  B  for  all  k  in  the  range  of  B's  dimension. 
There  is  a  bijective  mapping  between  the  instances  of  the  assertion  and 
the  instances  of  the  array  B.  It  is  therefore  very  natural  to  believe 
that  the  range  of  the  target  variable  B  is  the  same  as  the  range  of  the 
assertion.  Additionally,  from  the  subscript  expression  I  in  the  tern 
A( I )  we  can  derive  that  the  range  of  the  assertion  can  be  taken  from  the 
range  of  the  array  A.  In  short,  whenever  a  simple  subscript  variable  is 
used  as  a  subscript  expression  it  strongly  suggests  that  we  may 


propagate  the  range  from  one  node  subscript  to  another. 


When  a  subscript  expression  of  the  form  I-k  is  used  in  an 
assertion,  where  I  is  a  subscript  variable  and  k  is  a  positive  integer, 
there  exists  a  one-to-one  mapping  between  values  of  certain  elements 
indexed  by  I  and  Z-k.  The  mapping  may  be  interpreted  in  two  possible 
ways  >  assume  the  ranges  of  the  arrays  indexed  with  Z  and  Z-k 
subscripts  are  the  same,  or  assume  that  the  variable  with  the  Z-k 
subscript  expression  has  k  instances  fewer  than  the  variable  with  Z 
subscript.  we  have  decided  to  adopt  the  simpler  assumption,  that  is, 
the  ranges  are  the  same.  Therefore  we  will  propagate  ranges  between  the 
node  subscripts  indexed  by  subscript  expression  Z  and  Z-k. 

Zt  should  be  noted  that  we  do  not  intend  to  modify  or  ignore  a  user 
specified  range  of  a  node  subscript.  The  analysis  mentioned  above  is 
used  for  two  purposes.  One  is  to  derive  a  range  for  a  node  subscript 
Which  does  not  have  an  explicitly  specified  range.  Second  is  to 
determine  if  it  is  possible  to  put  two  node  subscripts  into  the  same 
range  set  when  both  of  them  have  user  specified  ranges  and  the  ranges 
are  the  same.  When  two  node  subscripts  have  user  specified  ranges,  we 
are  interested  in  finding  out  whether  their  ranges  are  equal.  Since 
there  is  no  simple  way  to  determine  if  two  functions  are  equal  in 
general,  we  will  only  check  the  assertions  which  define  the  range  arrays 
by  the  other  range  array. 


5.4.2  PRIORITT  OP  RANGE  PROPAGATION 

User  specified  ranges  are  associated  with  repeating  data  structures 
or  declared  global  subscripts.  The  range  specified  for  a  data  node  is 
interpreted  as  the  range  of  its  least  significant  dimension.  Ranges  of 
node  subscripts  can  be  propagated  along  a  path  in  the  Array  Graph  from 
one  node  to  another  based  on  the  following  relations  between  respective 
node  subscripts. 

1.  The  two  node  subscripts  are  both  global  subscripts  and  have  the  same 
global  subscript  name. 

2.  One  of  the  node  subscripts  corresponds  to  a  dimension  of  a  data  node 
and  the  other  corresponds  to  the  same  dimension  number  of  the 
associated  control  variable. 

3.  The  two  node  subscripts  occur  on  the  corresponding  dimensions  of  two 
data  nodes  in  the  sasm  data  structure. 

4.  One  node  subscript  is  associated  with  an  assertion  node  and  the 

other  is  associated  with  a  source  variable  of  the  assertion. 

5.  One  node  subscript  is  associated  with  an  assertion  node  and  the 

other  is  associated  with  the  target  variable  of  the  assertion. 

There  nay  be  several  alternative  paths  (and  directions)  for 

propagating  a  range,  and  the  range  derived  for  a  node  subscript  my 
depend  on  the  Choiae  of  a  path.  The  choice  of  path  my  also  affect  the 
efficiency  of  the  generated  program.  Therefore,  we  will  propagate 

ranges  according  to  a  priority  order  which  attempts  to  obtain  the 
highest  efficiency.  The  priority  order  is  as  follows. 
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When  a  global  subscript  is  used  in  several  assertions,  the  ranges 
of  the  respective  node  subscripts  (in  these  assertions)  are  the  same. 
Me  may  consider  all  the  node  subscripts  with  the  same  global  subscript 
name  as  a  group.  Whenever  any  element  in  the  group  has  its  range 
defined,  we  will  propagate  the  range  to  other  elements  in  the  same 
group,  this  type  of  propagation  will  have  the  top  priority. 

Next  consider  the  data  nodes  and  their  associated  control  variables 
such  as  SZZE.X,  END.X,  POINTER. X,  LEN.X,  ...,  etc.  The  dimensions  of 
the  control  variables  correspond  to  the  dimensions  of  the  variable  named 
in  the  suffix  from  left  to  right.  The  corresponding  dimensions  of  a 
data  node  and  its  associated  control  variables  should  have  the  same 
range.  Similarly  the  corresponding  dimensions  of  a  data  node  and  its 
higher  level  nodes  in  a  data  structure  should  have  the  same  range. 

If  the  range  specification  of  local  subscripts  in  assertions  or 
array  dimensions  are  not  given  explicitly,  we  will  derive  them  by 
analyzing  the  respective  subscript  expressions  in  assertions.  It  is 
preferable  to  propagate  the  range  from  a  target  variable  to  an  assertion 
rather  than  to  propagate  the  range  from  a  source  variable  to  an 
assertion.  Therefore,  the  range  propagation  between  an  assertion  node 
and  its  target  node  or  between  a  data  node  and  its  associated  control 
variable  will  have  the  second  priority. 

Globally  it  is  preferred  to  propagate  the  range  from  a  variable  in 
am  output  file  backward  to  a  variable  in  an  input  file  than  reversely. 
Thus  we  will  assign  the  third  priority  to  the  propagation  from  an 
assertion  node  backward  to  its  source  variables  and  the  fourth  priority 
to  the  propagation  from  a  data  node  forward  to  am  assertion  node  in 
which  it  is  referenced  as  a  source  variable. 

Example  Let  array  A  be  an  input  file  with  20  elements,  array  C  an  output 
file  with  10  elements  and  array  B  one  dimensional  interim  array. 
The  assertions 

ali  B(I)  -  A(I)  ) 
a2t  C(I)  -  B(I)  > 

may  lead  us  to  assign  either  20  or  10  as  the  range  for  array  B, 
depending  on  the  point  of  view  taken.  As  far  as  the  correctness  is 
concerned,  it  does  not  make  any  difference  whether  20  or  10  is  used 
as  the  range  of  array  B.  But  a  smaller  range  would  mean  potentially 
less  memory  space  and  less  computation  time.  Therefore  the  latter 
is  more  desirable.  The  range  may  be  evaluated  as  follows.  Since  no 
global  subscripts  are  used  here,  no  propagation  corresponding  to  the 
top  priority  can  be  achieved.  The  propagation  from  an  assertion 
node  to  the  target  variable  is  second  priority,  therefore,  the  range 
of  <C,l>  and  <B,1>  should  be  propagated  to  <a2,X>  and  <al,l> 
respectively.  The  range  of  subscript  <B,1>  will  be  that  of  <A,1>  or 
<C,1>  depends  on  Whether  we  give  higher  priority  to  the  propagation 
from  <A,  1>  to  <al,I>  or  from  <a2,I>  to  <B,1>.  Since  the  latter  has 
the  higher  priority,  the  range  is  propagated  from  array  C  all  the 
way  back  to  the  assertion  node  al.  (Refer  to  Fig.  5.1.) 
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al:  B(I)  =  a(I)  ; 
a2:  C(I)  =,  B(I)  ; 


R(<A,l>)a20 


RC  <al ,I>)s? 


R( <B,1>)=? 


R( <a2 ,!>)=? 


RC  <C,1>)  =  10 


Pig.  5.1  Example  of  Range  Propagation 


In  summary,  we  have  divided  the  range  propagation  into  four 
priority  levels.  the  top  level  is  based  on  use  of  global  subscripts. 
The  second  level  Is  based  on  the  relation  between  data  node  and  its 
associated  control  variables  or  between  the  assertions  and  their  target 
variables.  The  third  level  is  to  propagate  the  range  from  an  assertion 
backward  to  its  source  variables,  and  the  fourth  one  is  to  propagate  the 
range  from  a  data  array  forward  to  the  assertions  in  which  it  is 
referenced  as  a  source  variable . 


5.4.3  REAL  ARGUMENTS  OF  RANGE  FUNCTIONS 

Every  node  subscript  will  iterate  over  its  range  by  a  loop  control 
statement  in  the  generated  program.  A  node  in  the  Array  Graph  having  N 
node  subscripts  associated  with  it  will  have  an  N  level  nested  loop 
enclosing  it.  Every  loop  controls  the  iteration  of  a  corresponding  node 
subscript.  We  will  show  that  the  range  specification  of  the  node 
subscripts  may  have  influence  on  the  order  that  the  loops  can  be  nested 
and  on  the  order  of  subscripts  in  referring  to  a  range  array. 

When  the  ranges  of  the  dimensions  of  an  array  are  all  constant,  the 
array  has  a  regular  shape.  We  can  access  all  of  the  array  elements  by 
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an  optimal  program. 

A  generalized  solution  would  bs  to  treat  the  range  arrays  as 
functions  and  find  the  real  arguments  of  the  range  functions.  Por 
example,  an  N  dimensional  range  array  SIZE.X( II, . . . ,In)  may  be 
considered  as  a  function  which  maps  an  N  tuple  of  integers  II,  ...,  In 
to  an  integer  value  which  is  the  range  of  the  n+lth  dimension  of  array 
X.  Every  subscript  of  the  range  array  may  be  viewed  as  corresponding  to 
an  argument  of  the  function.  We  will  use  the  terms  range  array  and 
range  function  interchangeably.  Some  of  the  function  arguments  may  not 
affect  the  function  value,  namely  the  range  does  not  vary  with  the  value 
of  these  subscripts.  The  rest  of  the  arguments  which  do  play  roles  in 
determining  the  actual  value  are  called  real  arguments  of  the  range 
function . 

By  analyzing  the  assertion  Which  defines  a  range  air ray,  we  can  find 
all  the  real  arguments  of  the  range  array.  If  the  range  of  a  node 
subscript  <n,d>  is  specified  by  a  range  array  and  the  range  array  has 
some  real  arguments,  the  real  arguments  of  the  range  array  should 
correspond  to  some  other  node  subscripts  of  node  n.  In  the  generated 
program  the  loops  which  correspond  to  the  real  arguments  should  be 
scheduled  on  the  outside  level  of  the  loop  which  corresponds  to  the  node 
subscript  <n,d>.  For  example,  consider  the  specification  in 
Fig.  5.2(a).  The  range  array  SIZE. A  has  two  real  arguments,  i.e. 
<SIZE.A,1>  and  <SIZE.A,2>.  Since  the  node  subscript  <A,3>  references 
the  range  array  SIZE.  A  and  the  node  subscripts  <A,1>  and  <A,2> 
correspond  to  <SIZE.A,1>  and  <SIZE.A,2>  respectively,  node  subscripts 
<A,  1>  and  <A,2>  will  be  stored  in  the  real  argument  list  of  node 
subscript  <A,3>.  It  is  shown  in  Fig.  5.3.  The  loop  iterated  on  «A,1> 
and  <A,2>  will  be  scheduled  on  the  outside  of  the  loop  on  <A,3>. 
Similarly,  we  can  find  the  real  argument  lists  for  «al,K>  and  <B,3>. 


Fig.  5.3  Real  argument  lists  of  node  subscripts 

Example  We  will  show  how  transposing  an  array  effects  the  mapping 
between  the  real  arguments  of  the  range  arrays.  Let  us  examine  the 
following  assertions. 

B( I, J/X)  -  A( J, I,K)  , 

SIZE. A(M,N)  -  h(M,N)  ; 

Assuming  that  R(<A,1>)  is  equal  to  R(<B,2>)  and  R(<A,2>)  is  equal  to 
R(<B,1>).  the  range  for  subscript  <8,3>  is  obtained  from  R(<A,3>) 
which  is  given  by  SIZE. A.  SXZE.B(N,M)  should  be  equal  to 
SXZE.A(M,N).  All  we  need  is  a  permutation  of  subscripts  to  make  the 
range  array  SIZE. A  the  same  as  SIZE.B.  A  possible  flowchart  for  the 
loops  enclosing  node  A  and  B  is  shown  in  Fig.  5.4. 


DO  <A,1>  > 

DO  <A,2>  } 

DO  <A,3>-  1  TO  SIZE.A( <A, 1> , <A, 2> )  ; 

A( <A,1>, <A,2>, <A,3> )  ; 

END  > 

END  ) 

END  ; 


DO  <B,1>  ; 

DO  <B,2>  ; 

DO  <B,3>»  1  TO  SIZE.A(  <B,2>,  <B,1>  )  / 
B( <B,1>, <B,2>, <B,3> )  ; 

END; 

END  i 
END  ; 


Fig.  5.4  Transposition  of  real  arguments  of 
a  range  array 

It  should  be  noted  that  the  order  of  the  node  subscripts  <B,1>  and  <B,2> 
in  the  range  array  reference  SI2E.A( <B,2>, «B,1> )  is  significant  in  the 
loop  control  statement  for  <B,3>.  Therefore,  in  the  real  argument  list 
associated  with  the  node  subscript  <B,3>  we  should  store  the  real 
arguments  in  the  order  of  <B,2>  followed  by  <B,1>.  (Refer  to  Fig.  5.5) 
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The  data  structure  used  are  as  follows.  The  total  number  of  node 
subscripts  is  denoted  by  3ALLSUBS .  Every  node  subscript  is  assigned  a 
unique  sequence  number.  A  vector  TERMC( DICTIND )  of  integer  denotes  the 
kind  of  range  specification  used  for  the  least  significant  dimension  of 
each  node.  It  cam  have  the  values  of  1-4  to  denote  the  following 
conditions  t 

Is  the  data  structure  ham  a  constant  number  of  repetition. 

2:  the  range  is  specified  by  an  end  array. 

3s  the  range  is  specified  by  a  SIZE  array. 

4s  the  range  is  implied  by  reading  an  end  of  file. 

The  vector  LTERMC  provides  the  same  information  for  node  subscripts  as 
TERMC  for  the  nodes.  The  contents  of  TERMC  and  LTERMC  are  computed  by 
Algorithm  5.1. 

Algorithm  5.1  Find  User  Specified  Ranges 
Outputs 

TERMC  s  The  type  of  user  specified  ramge  of  every  node  in  the  Array 
Graph. 

LTERMC s  The  type  of  user  specified  range  of  every  node  subscript. 

1.  Initialize  the  vectors  TERMC  and  LTERMC  to  0. 

2.  For  each  node  n,  in  turn  doi 

If  attribute  VARYREP-O,  then  TERMC-1. 

If  attribute  ENDB>0,  then  TERMC-2. 

If  attribute  SIZEB>0,  then  TERMC- 3 . 

3.  For  every  node  n,  in  turn  do: 

If  TERMC(n)  is  not  equal  zero,  find  the  node  subscript  <n,d>  Which 
corresponds  to  the  least  significant  dimension  of  node  n.  Set  the 
LTERMC  entry  of  the  node  subscript  to  TERMC(n). 

Three  arrays,  HEADER,  SETNEXT,  and  LRANGEP  are  used  in  step  2. 
Each  of  them  has  $ ALLS  UBS  number  of  entries .  HEADER( I )  gives  the 
sequence  number  of  the  header  element  of  the  block  to  which  the  Ith  node 
subscript  belongs .  SETNEXT( I )  links  the  Ith  node  subscript  to  the  next 
node  subscript  in  the  same  block,  if  any.  When  the  Ith  node  subscript 
is  the  header  of  a  block,  then  LRANGEP ( I )  shows  the  range  of  the  Ith 
subscript.  Algorithm  5.2  partitions  the  set  of  all  the  node  subscripts. 
Initially  every  node  subscript  forms  a  block  by  itself.  Then  whenever 
we  find  that  two  node  subscripts  could  have  the  same  range  and  no  range 
conflict  would  occur,  we  will  merge  their  blocks.  This  merging  process 
will  continue  until  no  further  merging  can  be  done.  Since  every  node 
subscript  can  only  be  in  one  block  at  any  moment,  this  is  in  fact  a 
disjoint-set  union  problem[AHU  74].  The  blocks  formed  in  Algorithm  5.2 
acre  called  range  sets. 

Algorithm  5.2  Propagation  of  Range  Specification 
Input: 

LTERMC :  The  type  of  user  specified  range  for  every  node  subscript. 
Output: 

RANGE:  A  field  in  the  L0CAL_SUB  data  structure  of  every  node  subscript. 

It  contains  the  range  set  number  where  the  node  subscript 
belongs . 

5RNGSET:  Use  total  number  of  range  sets. 

SET3RNG:  The  node  number  of  the  header  of  a  ramge  set. 

Data  structures: 

3ALLSUBS i  The  total  number  of  node  subscripts. 

HEADER( 3ALLSUBS ) s  The  node  number  of  the  header  of  the  ramge  set  of  a 
node  subscript. 
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SETNEXT(  3ALLSUBS ) i  For  every  node  subscript,  it  points  to  the  next  node 
subscript  of  the  same  range  set. 

LRANGEP(  9ALLSUBS )  s  If  a  node  subscript  is  not  the  header  of  any  range 
set,  the  value  is  -1.  Else,  if  the  node  subscript  has  a  user 
specified  range,  the  value  is  the  data  node  number  of  the  range. 
Otherwise,  the  value  is  0. 

1 .  Initialization . 

Make  every  node  subscript  a  block  by  itself.  Por  all  values  of  I 
from  1  to  3ALLSUBS  dot 

HEADER(I)=I, 

SETNEXT( I  )=0,  /*  NO  NEXT  ELEMENT  */ 

LRANGEP( I )— node  of  the  range  /*  IP  IT  HAS  A  DEFINED  RANGE  */ 

-0,  /*  OTHERWISE  V 

2.  Merge  blocks  of  the  same  global  subscript  name: 

For  every  node  subscript  with  sequence  number  I,  check  whether  it  has 
a  global  subscript  name.  If  it  is  a  global  subscript  of  the  form 
POR_EACH.X  or  user  declared  subscript  X,  let  J  be  the  sequence  number 
of  the  node  subscript  which  is  associated  with  the  least  significant 
dimension  of  node  X.  Call  procedure  UNI0N(I,J)  to  merge  the  blocks 
containing  these  two  subscripts. 

3.  Propagate  ranges  between  data  nodes  and  control  arrays 
or  target  nodes  and  assertion  nodes: 

Por  every  edge  in  the  Array  Graph  with  edge  type  not  equal  to  3  check 
the  type  of  the  subscript  expressions  associated  with  the  edge. 
These  edges  connect  data  arrays  to  the  associated  control  arrays  and 
the  assertion  nodes  to  their  target  variables.  For  every  subscript 
of  the  source  node,  find  the  corresponding  subscript  in  the  target 
node.  If  the  APR_M0DE  of  the  subscript  expression  is  1  or  2,  merge 
them  using  procedure  UNION. 

4.  Propagate  ranges  from  assertion  to  source  variable: 

Scan  all  the  edges  of  type  3  which  connect  a  source  variable  to  an 
assertion.  The  range  is  to  be  propagated  backward ly.  If  the 
subscript  of  the  source  node  has  a  defined  range,  no  merge  will  be 
done.  Otherwise  check  if  the  APH_M0DE  of  the  subscript  expression  is 
1  or  2.  If  yes,  call  procedure  UNION  to  merge  it  with  the 
corresponding  subscript  of  the  target  node. 

5.  The  same  as  step  4.  Except  that  no  merge  will  be  done  if  the 
subscript  of  the  target  node  has  a  defined  range. 

6.  Check  the  header  of  each  block.  If  it  does  not  have  a  user  defined 
range,  check  the  elements  of  the  block.  If  there  exists  an  element 
which  is  associated  with  a  data  node  at  or  above  record  level  and 
being  the  rightmost  node  in  an  input  file  structure,  we  may  use 
end-of-file  as  the  default  range. 

7.  Assign  a  range  set  number  to  every  block  of  the  partition.  If  a  node 
subscript  belongs  to  the  kth  block,  put  k  into  the  RANGE  field  in  the 
data  structure  L0CAL_SUB  of  the  node  subscript.  Also  store  the  node 
number  which  gives  the  range  information  of  the  block  in  SET3RNG(k) 
entry. 

Procedure  UNI0WI.J) 

Input: 

I,J:  The  subscript  sequence  numbers  of  two  node  subscripts  for  which 
the  range  sets  will  be  merged. 

Output: 

Modify  the  data  structure  HEADER,  SETNEXT,  and  LRANGE  to  reflect 
the  merging  of  the  two  range  sets. 


1.  If  both  subscripts  I  and  J  are  in  the  same  block,  exit. 

2.  If  the  blocks  containing  subscript  I  and  J  have  different  ranges, 
exit. 

3.  Put  HEADER  I)  into  A. 

4.  Put  KEADER(J)  into  B. 

5.  Change  the  HEADER  entries  of  all  the  elements  in  the  same  block  as  J 
to  A. 

6.  Append  the  list  with  the  header  B  to  the  list  with  the  header  A. 

7.  Replace  LRANGEP(A)  by  LRANGEP(B)  if  LRANGEP(A)-0. 

8.  Set  LRANGEP(B)  to  -1. 

Step  three  examines  all  the  range  sets.  If  the  range  of  a  range 
set  is  specified  by  a  range  array ,  a  RAL  is  computed  for  every  node 
subscript  in  the  range  set. 

Algorithm  5.3.  Propagation  of  Real  Argument  List 
Input: 

I/FERMC:  Type  of  user  specified  range  of  every  node  subscript. 

RANGE:  A  field  in  the  LOCAL. SUB  data  structure  of  every  node  subscript. 

It  contains  the  range  set  number  where  the  node  subscript 
belongs . 

Output: 

RALP:  A  field  in  the  data  structure  LOCAL_SUB  of  every  node  subscript. 

For  every  node  subscript  whose  range  is  of  types  2,  3,  or  4,  it 
points  to  a  list  of  real  arguments  of  the  range  function. 

Data  structure: 

The  real  argument  list  pointed  to  by  RALP  consists  of  a  list  of 
elements  which  are  stored  in  the  data  structure  RAL.  The  fields 
in  the  RAL  are  as  follows. 

9 RAL:  The  number  of  real  arguments . 

RSPOS( $RAL ) :  The  subscript  position  of  a  real  argument  in  the  range 
array. 

MSPOS( 5RAL ) :  The  subscript  position  of  the  corresponding  real  argument 
in  the  node  subscript  list. 

1.  For  each  node  subscript  which  has  a  user  specified  range  and  the 
termination  criterion  is  not  constant,  form  the  RAL  for  it  and  put  it 
into  a  candidate  queue.  (Refer  to  Algorithm  5.4) 

2.  Iterate  step  3  to  step  7  until  the  candidate  queue  becomes  empty. 

3.  Get  a  node  subscript  from  the  queue.  Let  it  be  the  subscript  S  of 
node  X.  Propagate  the  RAL  of  S  to  other  node  subscripts  in  step  4, 
5,  6,  and  7.  If  any  node  subscript  gets  its  RAL  newly  defined,  put 
it  into  the  candidate  queue  such  that  its  RAL  can  be  propagated  to 
other  subscripts. 

4.  For  each  outgoing  edge  from  node  X,  propagate  the  RAL  of  subscript  S 
from  node  X  to  the  target  node.  (Refer  to  Algorithm  5.5) 

5.  For  each  incoming  edge  into  node  X,  propagate  the  RAL  of  subscript  S 
from  node  X  back  to  the  source  node.  (Refer  to  Algorithm  5.6) 

6.  If  subscript  S  references  a  global  subscript,  propagate  its  RAL  to 
the  global  subscript. 

7.  If  subscript  S  is  a  global  subscript,  then  propagate  its  RAL  to  all 
the  subscripts  which  reference  its  name. 

8 .  Stop . 

Algorithm  5.4.  Find  RAL  from  a  range  specifying  assertion 

Suppose  the  range  of  the  subscript  <X,n>  is  specified  by  an 
assertion.  Let  the  range  array  be  SI2E.X  or  END.X.  The  algorithm  tries 
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to  find  the  RAL  for  subscript  <X,n> . 

1.  Put  all  the  subscripts  of  the  target  variable  of  the  assertion  which 
defines  the  control  variable  SXZE.X  or  END.X  into  a  list. 

2.  If  the  target  variable  is  END.X,  delete  the  subscript  on  its  least 
significant  dimension  from  the  list. 

3.  Repeat  for  each  of  the  subscripts  in  the  RAL  to  check  Whether  it  is 
referenced  on  the  right  hand  side.  If  yes,  it  is  a  Real  Argument. 
Otherwise,  delete  it  from  the  list. 

4.  The  resulted  list  is  the  RAL  of  the  subscript  <X,n>. 

Algorithm  5.5.  Propagation  of  RAL  forward  along  an  edge 

Assume  SI  is  a  subscript  of  node  X  and  there  is  an  edge  E  from  node 
X  to  node  Y.  The  algorithm  propagates  the  RAL  of  si  to  some  subscript 
of  node  Y. 

1.  If  the  subscript  expression  of  SI  is  not  type  1  or  type  2,  exit. 

2.  Let  the  corresponding  subscript  of  node  Y  be  S2.  If  RAL  of  S2  is 
defined,  exit. 

3.  If  the  ranges  of  SI  and  S2  are  different,  exit. 

4.  For  each  subscript  in  the  RAL  of  SI,  check  its  subscript  expression 
type.  If  any  one  of  them  is  not  type  1,  exit.  Pind  their 
corresponding  subscripts  in  node  Y  and  form  a  new  list.  If  the 
ranges  of  the  corresponding  subscripts  are  not  the  same,  exit. 

5.  The  newly  formed  subscript  list  is  the  RAL  of  S2. 

Algorithm  5.6.  Propagation  of  RAL  backward  along  an  edge 

Assume  SI  is  a  subscript  of  node  X  and  there  is  an  edge  E  from  node 
Y  to  node  X.  The  algorithm  propagates  the  RAL  of  SI  to  some  subscript 
of  node  Y. 

1.  If  there  is  no  subscript  of  node  Y  corresponding  to  subscript  SI, 
exit. 

2.  Let  the  corresponding  subscript  of  node  Y  be  S2.  If  RAL  of  S2  is 
defined,  exit. 

3.  If  the  ranges  of  SI  and  S2  are  different,  exit. 

4.  For  every  subscript  Xi  in  the  RAL  of  SI  find  its  corresponding 
subscript  Yj  of  node  Y. 

4.1  Let  the  subscript  position  of  Xi  in  the  local  subscript  list  of 
node  X  be  i. 

4.2  Check  the  L0CAL_SUB5  field  in  the  data  structure  EDGE_SUBL 
associated  with  edge  E.  If  the  jth  local_subs  is  equal  to  i,  the 
jth  node  subscript  Yj  in  the  local  subscript  list  of  node  Y 


corresponds  to  Xi. 

4.3  Check  the  APR_MODE  corresponding  to  subscript  Yj  in  edge  E.  If 
it  is  not  1,  exit. 

4.4  Check  the  RANGE  field  of  the  node  subscript  Yj  and  that  of 
subscript  Xi.  If  they  are  different,  exit. 

5.  Form  a  subscript  list  which  contains  those  subscripts  Yj’s  of  node  Y. 

It  is  the  RAL  of  subscript  S2. 

Algorithm  5.7.  Propagate  RAL  between  Global  subscripts 

Suppose  subscript  SI  of  node  X  and  subscript  S2  of  node  Y  have  the 
same  global  subscript  name.  The  algorithm  propagates  the  RAL  of  SI  to 
S2 . 

1.  If  the  RAL  of  S2  is  defined,  exit. 

2.  For  eadh  subscript  T  in  the  RAL  of  SI,  get  its  range,  say  RT.  Check 
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all  the  subscripts  of  node  Y.  Xf  thsrs  is  one  and  only  one  subscript 
0  which  has  the  same  range  as  subscript  T,  then  subscript  U  is  the 
corresponding  subscript  of  T.  Otherwise,  exit. 

3.  Form  a  subscript  list  which  contains  those  subscripts  U's  of  node  Y. 
It  is  the  RAL  of  S2. 


S.6  DATA  DEPENDENCY  OF  RANGE  INFORMATION 

Xn  section  4.4.2  we  have  mentioned  that  range  arrays  cause  implicit 
data  dependency  relationship.  The  edges  of  type  13  and  14  in  the  Array 
Graph  represent  this  type  of  data  dependency.  However,  it  is  not  enough 
if  we  only  have  the  edges  from  a  range  array  SXZE.X  or  END.X  to  the  node 
X.  For  every  node  in  the  Array  Graph,  no  matter  whether  it  is  a  data  or 
an  assertion  node,  as  long  as  one  of  its  node  subscripts  is  in  a  range 
set  where  the  range  is  deined  by  a  range  array,  an  edge  should  be  drawn 
from  the  range  array  to  that  node. 

We  can  tell  the  range  of  every  node  subscript  only  after  the  range 
propagation  phase.  Therefore,  the  correct  time  to  add  this  type  of  data 
dependency  relationship  is  after  we  have  found  all  the  range  sets.  If  a 
range  set  has  a  range  array  as  its  range  specification,  then  there  will 
be  edges  emanating  from  the  range  array  and  terminating  at  every  node  in 
the  range  set.  Subscript  expressions  of  type  l  axe  associated  with  the 
edges  emanating  from  a  SIZE  range  array.  Subscript  expression  of  type  2 
is  associated  with  the  least  significant  dimension  of  an  END  range  array 
and  type  1  subscript  expressions  are  associated  with  the  other 
dimensions  of  the  END  range  array. 
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CHAPTER  6 


SCHEDULING 


6.1  OVERVIEW  OF  SCHEDULING 

Through  the  phases  of '  data  dependency  analysis,  dimension 
propagation,  and  range  propagation  we  have  analyzed  the  user’s 
specification  and  checked  the  consistency  and  completeness  of  the 
specification.  In  a  non-procedural  programing  language,  the  execution 
sequence  is  not  specified  in  the  program  specification.  The  objective 
in  this  Chapter  is  to  determine  the  order  of  execution  in  performing  the 
specified  computation.  We  have  collected  the  needed  information  in  the 
convenient  form  of  the  Array  Graph.  The  Array  Graph  contains  all  the 
program  activities  as  nodes  and  the  data  dependency  relationships  as 
edges.  The  next  step  toward  constructing  a  program  is  ordering  the 
program  activities  represented  by  the  nodes  of  the  Array  Graph  under  the 
constraints  posed  byt  a)  the  edges  of  the  Array  Graph,  and  b) 
considerations  of  computation  efficiency.  As  stated  in  chapter  1, 
efficient  scheduling  is  one  of  the  main  contributions  of  the  reported 
research.  This  method  of  synthesizing  the  program  is  called  scheduling 
here.  It  is  followed  by  the  actual  program  code  generation. 

Two  rules  which  are  frequently  accepted  in  programing,  except  in 
cases  Where  memory  limitations  are  extremely  severe,  will  be  followed 
here  as  well.  The  first  is  that  every  input  file  is  to  be  read  only 
once.  This  rule  will  reduce  the  number  of  input  activities  Which  are 
usually  relatively  slow.  If  necessary  we  may  store  the  input  data  in 
the  memory  for  repetitive  use.  However,  sometimes  the  memory  price  may 
be  very  high  due  to  the  large  capacity  of  external  storage.  The  second 
rule  is  that  no  values  are  to  be  recomputed .  This  means  that  once  an 
element  has  been  computed  it  will  be  retained  as  long  as  it  is  needed 
for  later  reference. 


6.1.1  A  BASIC  APPROACH  TO  SCHEDULING 


A  correct  but  often  inefficient  realization  of  a  computation  can  be 
obtained  through  the  following  scheduling  method.  Our  eventual  approach 
will  be  partly  based  on  this  simpler  basic  approach.  The  acyclic 
portions  of  an  Array  Graph  may  be  scheduled  very  simply  as  follows.  A 
topological  sort  algorithm  can  be  applied  to  obtain  a  linear  ordering  of 
the  nodes  in  the  graph  in  accordance  with  the  edge  constraints. 
Multi-dimensional  nodes  are  then  enclosed  within  nested  loop  controls. 
Every  loop  iterates  the  respective  node  over  the  instances  of  one  of  the 
distinctive  node  subscripts  of  the  node. 

When  there  are  cycles  in  the  Array  Graph,  a  topological  sort  trill 
not  succeed.  Superficially,  a  cycle  in  the  Array  Graph  means  a  circular 
definition  Which  does  not  allow  us  to  determine  a  linear  order  for  the 
computation.  Actually  since  the  Array  Graph  masks  some  of  the  details 
of  the  relationships  in  the  corresponding  Underlying  Graph  (see  Chapter 
4),  there  may  be  a  cycle  in  the  Array  Graph  Where  there  are  no  cycles  in 
the  corresponding  Underlying  Graph.  Also  iterative  solution  methods  can 
be  applied  to  perform  the  computations  even  Where  there  are  cycles  in 
the  Underlying  Graph.  Me  have  to  apply  a  deeper  analysis  of  the  nodes 
and  subscript  expressions  used  in  assertions  in  the  cycle.  The  cycles 
that  are  found  to  be  really  not  circular  can  be  resolved  to  generate  a 
linear  schedule.  The  method  employed  is  briefly  described  as  follows. 
The  Array  Graph  is  deconposed  into  subgraphs.  Each  subgraph  is  a  most 
strongly  connected  component  (MSCC).  A  MSCC  in  a  directed  graph  is  a 
maximal  subgraph  in  Which  there  is  a  path  from  any  node  to  any  other 
node.  The  deeper  analysis  is  then  applied  to  the  MSCC  components  in  the 
Array  Graph.  The  analysis  described  in  section  6.2  consists  of  search 
of  a  dimension  that  is  conanon  to  all  the  nodes  in  the  MSCC.  if  an  edge 
is  found  in  the  MSCC  Which  has  an  I-k  type  subscript  expression 
associated  with  it,  the  edge  may  be  deleted.  This  sometimes  results  in 
an  acyclic  subgraph  Which  can  be  topologically  sorted.  If  this  method 
is  not  successful  then  other  analysis  methods,  or  alternatively  an 
iterative  solution  method  may  be  applied. 


6.1.2  EFFICIENT  SCHEDULING 

In  general,  a  schedule  Which  satisfies  the  constraint  of  the  data 
dependency  relationship  is  not  unique,  if  one  exists.  Therefore,  there 
is  a  degree  of  freedom  to  select  a  schedule  which  meets  efficiency 
requirements  as  well.  Ns  want  to  have  a  schedule  with  the  fewest  number 
of  loops  or  with  the  least  amount  of  working  storage  for  the  program 
variables.  Although  we  will  use  here  the  results  of  the  basic 
scheduling  approach  mentioned  above,  our  method  of  scheduling  consists 
essentially  of  a  process  of  repeated  merging  of  basic  MSCC s  in  the  Array 
Graph.  As  will  be  shown,  in  this  way  we  can  reduce  the  use  of  memory 
and  computation  time. 

Non-procedural  programming  uses  as  many  variables  as  the  values 
that  occur  during  the  program  computation.  If  we  simply  allocate 
separate  memory  space  to  each  variable,  as  may  be  done  in  the  basic 
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approach,  we  will  most  probably  get  a  program  which  usee  a  large  amount 
of  memory  space  and  in  some  cues  may  not  be  executable.  Therefore,  we 
are  here  primarily  concerned  with  memory  efficiency  of  the  program.  Our 
approach  is  to  examine  the  effect  on  use  of  memory  due  to  merging  of 
blocks  of  nodes  of  the  same  or  related  subscript  ranges  and  fora 
iteration  loops  for  the  selected  subscripts  enclosing  the  merged  blocks. 
We  will  select  mergers  of  blocks  of  nodes  which  reduces  the  use  of 
memory  the  most. 

In  some  cases  we  have  an  alternative  of  maximising  the  scope  of  one 
loop  at  the  cost  of  reducing  the  scope  of  one  or  more  other  loops.  The 
choice  of  which  loop  scopes  are  maximized  is  based  on  comparison  of 
memory  requirements  of  the  alternatives.  The  alternative  that  requires 
least  memory  space  for  program  variables  will  be  selected. 

The  repetitions  indicated  by  the  node  subscripts  are  controlled  by 
loop  statements.  The  execution  of  loop  statements  takes  sons  CPU  time. 
If  the  loop  scopes  in  a  program  are  small,  i.e.  if  they  contain  fewer 
nodes,  then  there  will  be  more  loops  in  the  program  and  the  overhead 
spent  on  the  loop  control  statements  will  be  increased.  This  is  another 
reason  why  it  is  desirable  to  maximize  the  loop  scopes  in  the  generated 
programs. 


6.1.3  OUTLINE  OP  THE  CHAPTER 

The  material  in  sections  6.2,  6.3,  and  6.4  forms  a  background  to 
understanding  the  optimization  in  the  scheduling  algorithm.  In  section 
6.2  we  will  discuss  the  analysis  of  MSCCs.  The  algorithm  of  our 
optimizing  scheduler  is  based  on  deeper  analysis  of  cycles.  A  similar 
approach  was  used  previously  in  an  earlier  version  of  the  MODEL 
processor .  Some  changes  discovered  in  the  course  of  the  presently 
reported  research  have  been  added .  The  merger  of  components  is 
discussed  in  section  6.3.  There  are  two  bases  for  merging  of 
components:  when  components  have  the  same  subscript  ranges  and  when 
they  have  related  range  (this  is  explained  later).  In  section  6.4  we 
will  introduce  the  memory  penalty  concept  which  will  be  used  to  evaluate 
the  use  of  memory  in  a  partially  designed  schedule.  The  memory  penalty 
is  the  memory  cost  associated  with  a  candidate  subschedule.  The 
scheduling  algorithm  is  presented  in  section  6.5. 


6.2  ANALYSIS  OP  MSCC 
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6.2.1  CYCLES  IN  THE  ARRAY  GRAPH 

A  cycle  in  the  Array  Graph  Mans  that  a  variable  definition  depends 
directly  or  indirectly  on  itself.  An  Array  Graph  is  a  compact 
representation  of  an  Onderlying  Graph.  It  does  not  show  the  details  of 
precedence  relationships  in  the  Onderlying  Graph.  Therefore,  the 
apparent  circularity  nay  be  deceptive  and  not  be  reflected  in  the 
Onderlying  Graph .  In  this  case  a  correct  computation  may  be  realized 
for  an  Array  Graph  cycle. 

Consider  for  example  the  assertion  in  Pig.  6.1  which  defines  the 
factorial  function.  Because  of  the  recursive  definition  there  is  a 
cycle  in  the  Array  Graph.  But  there  is  no  cycle  of  precedence 
relationship  in  the  corresponding  Onderlying  Graph.  Therefore,  there 
exists  a  precedence  ordered  sequence  for  computing  all  the  factorial 
values. 


a(I):  F(I)  =  IF  1*1  THEN  1  ELSE  I*F(I-1)  ; 


(a)  Assertion 


a(l)  b(2) 


tCl);  (F(2) 


(b)  Array  Graph 


(c)  Underlying  Graph 


Pig.  6.1  Example  of  cycles  in  the  Array  Graph 


A  MSCC  in  the  Array  Graph  may  or  may  not  represent  a  circular 
definition.  If  it  is  not  truly  circular,  we  may  be  able  to  perform  the 
respective  computation  by  using  an  iteration  loop.  In  section  6.2.2  we 
will  discuss  the  conditions  under  which  a  MSCC  can  be  enclosed  in  a 
loop.  If  these  conditions  are  met,  we  will  find  the  loop  parameter  to 
bracket  the  entire  MSCC.  Once  such  loop  is  found,  since  the  loop 
indices  are  ascending,  the  precedence  relationships  between  the 
respective  loop  instances  is  assured.  Therefore,  as  shown  in  section 
6.2.3  we  delete  edges  with  I-k  subscript  expressions  and  the  MSCC  may  be 


decomposed.  if  the  above  method  fails,  there  are  other  approaches  to 
schedule  a  MSCC  Which  will  be  discussed  in  section  6.2.4. 


6.2.2  ENCLOSING  A  MSCC  WITHIN  A  LOOP 

The  objective  of  iterative  computations  of  a  single  data  or  an 
assertion  node  is  to  define  all  the  elements  corresponding  to  the  values 
of  node  subscripts  associated  with  the  node.  In  general,  the  values  of 
every  node  subscript  cam  be  stepped  independently  of  other  node 

subscript  values.  Therefore,  a  node  with  N  node  subscripts  would  have 

an  N  level  nested  loops  enclosing  it,  and  each  level  of  the  nested  loop 
corresponds  to  one  distinctive  node  subscript.  We  will  associate  with 
every  loop  a  loop  variable  with  values  which  are  stepped  up  by  one  from 
one  to  the  upper  bound  of  a  subscript  range.  All  the  nodes  inside  the 
scope  of  a  loop  will  be  executed  once  for  every  possible  value  of  the 
loop  variable .  Generally  if  a  node  does  not  have  a  node  subscript 
corresponding  to  a  loop  variable,  the  repetition  would  be  redundant.  We 
want  to  treat  an  entire  MSCC  in  some  manner  as  a  single  node,  i.e.  to 

compute  all  the  elements  of  the  nodes  in  the  MSCC  iteratively.  We 

require  however  that  all  the  nodes  of  a  MSCC  have  a  node  subscript  with 
Which  a  loop  brackets  the  MSCC.  If  one  of  the  nodes  does  not  have  such 
a  node  subscript  then  the  activity  represented  by  the  node,  such  as 
input/output,  may  be  repeated.  Which  will  cause  am  erroneous 
computation.  All  the  distinguished  dimensions  must  then  have  the  same 
range .  It  should  be  noted  that  the  loop  variable  is  stepped  up  each 
iteration  by  one,  and  no  computation  of  a  loop  instance  cam  depend  on 
any  computations  in  later  loop  instances. 

Given  a  MSCC  in  the  Array  Graph,  we  will  first  check  if  all  the 
nodes  in  the  MSCC  have  more  than  sero  dimensions.  If  every  node  does 
have  at  least  one  distension  to  schedule,  we  will  then  check  the 
subscript  expressions  on  the  edges  of  the  MSCC  to  see  if  the  entire  MSCC 
cam  be  enclosed  within  a  loop.  The  edges  in  the  Array  Graph  represent 
relationships  between  some  elements  of  the  nodes  at  the  ends  of  the 
edges.  The  subscript  expressions  associated  with  edges  reveal  more 
precisely  the  precedence  relationships  between  specific  elements,  in 
the  following  we  examine  the  subscript  expressions  associated  with  an 
edge  to  determine  if  the  nodes  at  the  end  of  the  edge  can  be  scheduled 
within  the  scope  of  a  loop. 

Definition  Let  A  be  a  node  of  n  dimensions.  Then  A  denotes  the  set  of 
all  the  instances  of  node  A,  i.e.  jk  -  {A( II, . . . ,In)| 
1<-Ik<-R( <A,k> ),  for  l«»k<«n  }. 

Definition  Let  A  be  a  node  of  n  dimensions.  Then  A(Ii-Cl;  Ij-C2>  ...) 
denotes  the  set  of  all  the  instances  of  node  A  with  the  ith 
subscript  Ii  being  Cl  and  the  jth  subscript  Ij  being  C2,  ...  etc. 

Consider  an  edge  from  node  A(J1, ...,Jta>  to  node  8(11, ...,In)  in  the 
Array  Graph i 

B( II , . . . , Ik, . . . , In )  <  A( El, . . , , Ep, . . . , Em ) 

Where  J's  and  I's  are  the  node  subscripts  of  node  A  and  B  respectively. 
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and  E'a  are  the  subscripting  expressions  of  A.  Consider  the  subscript 

expressions  of  types  1,  2,  3,  and  4. 

1)  If  a  subscript  expression  Ep  is  of  type  1  and  equals  to  Dc,  then 
every  element  in  B(Ik-c)  depends  only  on  the  elements  in  A(  Jp-c ) . 
Since  3(Hc«c)  does  not  depend  on  any  element  in  A( Jp-d )  with  d>c,  the 
Underlying  Graph  dependencies  are  satisfied  if  node  A,  followed  by  B, 
are  bracketed  by  a  loop  where  the  parameters  of  the  iteration  axe  the 
pth  dimension  of  A  and  the  kth  dimension  of  B .  These  are  referred  to 
as  a  distinguished  dimension  of  A  or  of  B. 

2)  If  the  subscript  expression  Ep  is  type  2  or  3  and  equals  to  Ik-a, 
then  for  any  positive  integer  c  every  element  in  B( Ik-c )  depends  only 
on  the  elements  in  A( Jp—c-a) .  since  the  parameters  of  the  bracketing 
loops  are  in  ascending  order  ( in  step  of  1 )  then  this  assures  that 
A(  Jp—d )  is  computed  before  B(  Ik«c )  with  d<c.  Thus  it  is  allowed  to 
schedule  node  A  and  B  into  one  loop,  with  Ik  and  Jp  the  distinguished 
dimensions . 

3)  If  the  subscript  expression  Ep  is  type  4,  then  for  any  positive 
integers  c  and  d  every  element  in  B( ik-c )  may  depend  on  elements  in 
A(  Jp-d ) .  we  will  be  conservative  and  assume  that  every  element  in 
B(  Ik-c )  depends  on  at  least  one  element  in  A(Jp-d)  with  d>c. 
therefore,  it  is  impossible  to  designate  the  pth  dimension  of  A  and 
the  kth  dimension  of  B  as  the  distinguished  dimensions  for  a  loop. 

Example  Given  an  assertion  al  as  follows.  Let  A  and  B  be  square  arrays . 
there  is  an  edge  from  array  node  A  to  assertion  node  al. 

al( I, J)<  B(I,J)  -  A(g,J)> 

where  g  is  a  type  4  subscript. 

Consider  the  node  set  (A,al).  Consider  scheduling  this  set  into 
one  loop  with  <A,1>  and  <al,I>  as  their  distinguished  dimensions. 
Let  sa  be  {A( J1,J2)|J1«2>  and  SB  be  {al(I,J)|I»l}.  SB  is  in  the 
first  instance  of  the  loop  and  SA  is  in  the  second  instance  of  the 
loop,  therefore  SB  precedes  SA.  Consider  next  the  element  al(l,2) 
of  SB.  we  can  find  an  element  A(2,2)  in  SA  which  precedes  al(l,2) 
because  of  the  type  4  subscript  on  <A,1>  dimension.  SB  and  SA  then 
precede  each  other,  in  the  underlying  Graph,  and  therefore  can  not 
be  scheduled. 

Example  Given  the  assertion  a2  below. 

a2( I, J) i  Y(  I ,  J )  -  X(I,J)  +  X(J,I)j 

X  is  a  square  array  and  subscripts  <X,  1>,  <a2,l>,  and  <a2,J>  have 
the  same  range.  We  want  to  schedule  the  node  set  (X,a2)  in  one 
loop  with  <x,l>  and  <a2,I>  as  the  distinguished  dimensions. 

All  the  subscript  expressions  being  used  with  node  X  are  not  type 
4.  However,  in  the  term  X(J,  I)  a  subscript  J  occurs  on  the 
distinguished  dimension  of  X,  i.e.  <X,  1>.  Since  <a2,J>  does  not 
correspond  to  the  distinguished  dimension  of  node  a2,  it  may  be 
scheduled  in  am  inner  level  loop  and  iterates  faster  than  <a2,I>, 
therefore  some  array  elements  of  X  will  be  referenced  before 
defined.  Thus  we  should  not  form  a  loop  with  these  designated 
distinguished  dimensions. 

Prom  the  examples  above  we  know  that  the  subscript  expression  on  the 
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distinguished  dimension  of  a  node  must  not  be  a  general  expression  and 
it  should  correspond  to  the  distinguished  dimension  of  another  node  in 
the  same  loop,  otherwise  the  loop  can  not  be  formed.  Since  the  loop 
instances  are  strictly  running  upward  starting  from  one  and  all  the 
subscript  expressions  on  the  distinguished  dimensions  axe  of  the  form  I 
or  I-k,  no  reference  goes  to  the  later  loop  instances,  therefore,  no 
data  dependency  relationship  is  violated.  In  fact,  by  constructing  the 
loop  we  have  divided  the  whole  computation  into  many  smaller  tasks  where 
every  task  corresponds  to  a  loop  instance.  It  should  be  noticed  that 
the  formation  of  an  outer  loop  does  not  exclude  the  possibility  that  the 
original  computation  involves  an  unsolvable  cycle.  What  we  are  assured 
is  that  the  outer  loop  divides  the  original  problem  into  smaller  ones 
and  which  cam  be  solved  easier. 


6.2.3  DECOMPOSING  A  MSCC  THROUGH  DELETION  OF  EDGES 

Consider  now  the  case  where  am  MSCC  is  scheduled  in  one  loop  based 
on  the  tests  described  in  the  previous  subsection.  The  nodes  in  the 
MSCC  have  each  a  distinguished  dimension  Which  corresponds  to  the  loop 
variable.  Also  the  subscript  expressions  associated  with  the 
distinguished  dimensions  are  of  the  form  either  I  or  I-k.  We  will  show 
in  the  following  that  where  the  parameter  of  the  loop  is  stepped  up  from 
one  by  a  step  of  one  then  edges  which  have  a  subscript  expression  of 
type  2,  i.e.  I-k,  are  superfluous  and  can  be  removed. 

Consider  am  edge  of  the  form  B( . . . , I, . . . )  < -  A( . . . , I-k, . . . )  where 

I-k  and  I  occur  on  the  pth  and  the  qth  dimension  of  nodes  A  and  B, 
respectively.  If  node  A  and  B  are  scheduled  in  the  loop  of  I,  then  the 
elements  in  A( Jp-I-k )  have  been  evaluated  in  the  I-kth  loop  instance  and 
the  elements  in  B( Iq-I )  are  evaluated  in  the  Ith  loop  instance.  Since 
the  values  of  loop  variables  are  ascending,  therefore  every  element  of 
A(  Jp-I-k )  precedes  all  the  elements  of  £(  lqp»I ) .  This  implies  that  the 
precedence  relation  represented  by  the  above  edge  is  superflous  as  it  is 
enforced  by  the  order  of  evaluation  of  the  respective  elements.  In 
short,  when  two  nodes  are  scheduled  in  a  loop  of  loop  variable  I,  the 
precedence  relationship  presented  by  subscript  expression  i-k  is 
subsumed  by  the  order  of  loop  execution.  This  is  illustrated  in 
Fig.  6.2,  showing  the  Array  Graph  of  a  Factorial  function  which  is 
defined  with  recursion.  The  recursion  causes  a  cycle  of  two  nodes  {al, 
FAC}. 
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Pig.  6.2  Remove  Z-k  edges  In  a  loop 


These  two  nodes  can  be  scheduled  in  a  loop  iterating  over  node 
subscript  <al,I>.  The  kth  instance  of  the  assertion  al  is  evaluated  in 
the  kth  loop  instance  and  it  references  the  k-lth  instance  of  the  array 
FACT,  Which  has  been  evaluated  previously  in  the  k-lth  loop  instance. 
Therefore  the  edge  associated  with  subscript  expression  1-1  can  be 
removed.  There  is  no  further  a  cycle  in  the  Array  Graph. 


6.2.4  OTHER  APPROACHES  TO  DECOMPOSING  AN  MSCC 


There  are  a  number  of  methods  for  scheduling  a  MSCC  in  an  Array 
Graph.  We  have  been  primarily  interested  in  the  cases  that  a  cycle  can 
be  implemented  by  a  loop  with  the  parameter  that  runs  upward  from  one. 
However,  not  all  the  cycles  can  be  implemented  with  this  simple  loop 
mechanism.  Thus  if  the  above  approach  fails  it  will  be  necessary  to 
apply  other  methods.  Consider  first  the  case  where  the  array  elements 
may  be  evaluated  in  a  sequence  which  does  not  follow  the  natural 
ascending  order  of  subscripts.  Consider  for  example  the  following 
specification  which  defines  A,  a  vector  of  50  elements. 


Example 

A(I)  -  IP  1-25  THEN  X 

ELSE  IP  I<25  THEN  A< 1+2 )+X 
ELSE  A( I— 1 )+A( 1—25 )  ; 
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A  possible  PL/ I  program  to  compute  array  A  is  am  follows. 

A(25)  -  X  , 

DO  I  -  23  TO  1  BY  -2  ; 

A(l)  -  A(  1+2  )+X  ; 

END  ; 

A(  26  )  -  A(  25  )+A(  1 )  j 

DO  I  =  24  TO  2  BY  -2  ; 

A(I)  -  A(I+2)+X  i 

END  ; 

DO  I  -  27  TO  50  } 

A(I)  -  A(  I— 1  )+A(  I— 25  )  ; 

END  ; 

When  the  subscript  expressions  axe  first  order  polynomials,  we  cam 
divide  am  array  nodes  into  mamy  parts  amd  compute  the  parts  of  the  array 
separately  [SHAS  78]. 

A  cycle  in  the  Array  Graph  may  also  be  considered  as  a  set  of 
simultaneous  equations  and  numerical  methods  such  as  Jacobi  and 
Gauss-Seidel  iterations  cam  be  applied  to  solve  the  system  of  equations 
[GREB  81].  Since  splitting  nodes  in  the  Array  Graph,  am  suggested  by 
Shastry,  is  complicated  to  apply,  the  MSCCs  which  can  not  be  decomposed 
may  be  treated  similar  to  simultaneous  equations  amd  solved  iteratively. 
In  this  dissertation  we  will  refer  only  to  the  cases  that  a  MSCC  can  be 
decomposed  am  described  above.  The  other  methods  are  described  in  the 
references . 


6.2.5  A  SIMPLE  SCHEDULING  ALGORITHM 

the  methods  of  scheduling  an  MSCC  in  a  loop  and  attempting  to 
decompose  a  MSCC  may  have  to  be  applied  repeatedly,  depending  on  the 
outcome  of  each  application.  This  section  describes  a  simple  scheduling 
algorithm  which  incorporates  repeated  application  of  the  methods 
described  earlier.  It  generates  a  correct  schedule  bamed  on  an  Array 
Graph.  However  it  does  not  include  the  consideration  of  program 
efficiency. 

The  algorithm  consists  of  two  mutually  recursive  procedures, 
SCHEDULE_GRAPH  and  SCHEDULE_COMPONENT .  Given  any  Array  Graph  as  input, 
SCHEDULE_GRAPH  procedure  finds  the  MSCCs  in  the  Array  Graph.  The  MSCCs 
are  then  sorted  into  a  sequence  {Ml, M2, . . . ,Mn)  which  retains  the  partial 
order  of  the  precedence  relationships  between  the  MSCCs. 
SCHDULE_COMPONENT  procedure  then  schedules  each  component  separately. 
If  Si  is  the  schedule  of  component  Mi,  the  sequence  {S1,S2, . . .Sn)  is 
returned  as  the  schedule  of  the  original  graph . 

The  input  to  procedure  SCHEDULE_COMPONENT  is  an  MSCC,  say  Mi.  If 
Mi  is  a  single  node  component  and  there  is  no  unscheduled  node  subscript 
associated  with  it,  the  node  itself  is  returned  as  the  schedule  of  the 
component .  otherwise,  the  component  may  be  schedulable  in  a  loop.  The 
procedure  tries  to  find  a  loop  variable  Which  satisfies  the  requirements 
discussed  in  the  previous  section.  If  a  loop  variable  is  found,  say  I, 
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it  then  deletes  the  edges  in  component  Mi  with  subscript  expression  I-k 
and  marks  the  distinguished  dimensions  of  the  nodes  in  Mi  as  scheduled. 
Let  Mi’  denote  the  resulting  graph.  Then  it  calls  the  procedure 
SCHEDULE_GRAPH  to  produce  a  schedule  for  the  graph  Mi  ’ .  After 
SCHEDCTLE_GRAPH  returns  the  schedule  of  Mi*,  a  loop  with  loop  variable  I 
and  loop  body,  the  schedule  of  Mi*  is  formed  by  SCHEDULE_COMPONENT  and 
returned  as  the  schedule  of  Mi.  If  no  loop  variable  can  be  found, 
SCHEDULE_COMPONENT  sends  a  warning  message  to  the  user  and  calls  the 
procedures  described  in  section  6.2.4  to  decompose  the  MSCC. 


6.3  MERGER  OP  COMPONENT'S  TO  ATTAIN  HIGHER  EFFICIENCY 

The  basic  scheduling  algorithm,  described  above,  consists 
essentially  of  topological  sorting  of  the  nodes  or  MSCCs  in  the  Array 
Graph  and  of  the  enclosing  of  these  entities  within  the  scope  of  nested 
loops  for  the  respective  dimensions.  In  contrast,  the  scheduling 
algorithm  offered  here  considers  the  Array  Graph  globally  and 
progressively  merges  components  into  the  scope  of  a  selected  loop  which 
reduces  the  most  the  use  of  memory  and  computing  time.  The  scope  of  the 
loops  in  the  schedule  is  thus  progressively  enlarged. 

Given  an  Array  Graph  as  input,  we  can  construct  a  component  graph 
where  every  MSCC  is  a  component  node  and  an  edge  is  drawn  from  component 
A  to  component  B  if  and  only  if  there  exists  an  edge  in  the  original 
Array  Graph  which  leads  from  a  node  in  the  component  A  to  a  node  in  the 
component  B.  The  component  graph  is  an  acyclic  graph.  Note  that  the 
MSCCs  in  an  Array  Graph  are  not  further  divisible.  The  merger  process 
starts  with  the  MSCCs  in  the  Array  Graph  as  the  basic  components,  and 
through  merger  it  creates  larger  components  progressively.  A  loop  scope 
can  be  the  union  of  some  MSCCs.  In  this  section  we  will  discuss  the 
merging  of  MSCCs  in  an  Array  Graph  into  the  scope  of  one  loop. 


6.3.1  MERGER  OF  COMPONENTS  WITH  THE  SAME  RANGE 

The  condition  for  scheduling  a  set  of  component  in  one  loop  is  that 
every  component  in  the  scope  of  a  loop  have  a  d ist inouished  dimension 
corresponding  to  the  loop  variable.  There  are  several  condition  on 
designating  distinguished  dimension  of  a  node  in  an  Array  Graph  or  a 
Component  Graph.  First  the  distinguished  dimensions  of  the  components 
must  be  in  the  same  range  set  and  have  a  common  range  which  specifies 
the  number  of  iterations  of  the  loop.  The  loop  variable  is  stepped  up 
by  one  in  successive  iterations.  Therefore  also  the  order  of  execution 
of  elements  of  each  component  will  be  evaluated  in  this  order.  The 
second  condition  is  that  an  evaluation  of  each  instance  of  a  component 
in  a  loop  instance  should  not  refer  to  values  computed  in  later  loop 
instances . 

Further,  components  to  be  merged  into  the  scope  of  a  loop  may  not 
depend  on  any  other  component  which  does  not  have  a  distinguished 


11*  - 


dimension  and  Which  in  turn  depends  on  one  of  the  components  to  be 
merged.  The  rule  is  that  a  set  of  cooiponents  Which  can  be  scheduled  in 
one  loop  should  be  equal  to  its  closure .  The  closure  of  a  set  of 
components  includes  all  the  components  which  are  reachable  from  any 
component  in  the  set  and  which  also  reach  any  component  in  the  set.  For 
example,  consider  the  component  graph  in  Pig.  6.3.  The  components  Cl, 
C2,  and  C4  have  a  common  dimension  I.  Still  they  can  not  be  merged  into 
the  scope  of  a  loop  with  the  loop  variable  I.  The  closure  of  the  set  of 
components  {Cl,  C2,  C4}  includes  component  C3.  Since  C3  does  not 
iterate  with  subscript  Z,  it  cam  not  be  scheduled  in  the  loop  of  I. 
Component  C4  can  be  scheduled  only  after  component  C3.  Therefore,  at 
most  we  can  merge  components  Cl  and  C2  or  C2  and  C4  into  the  scope  of  a 
loop. 


The  set 


ft*  closure 
of  the  set 


Fig.  6.3  Closure  of  a  set  of  components 


The  search  and  selection  of  a  distinguished  dimension  for  each 
component  in  a  set  is  similar  to  the  analysis  of  subscript  expressions 
in  MSCCs  described  in  section  6.2.  We  showed  there  that  the  subscript 
expressions  associated  with  edges  terminating  at  a  component  can  not  be 
type  4  and  that  subscript  expressions  associated  with  the  edge  should 
connect  the  distinguished  dimensions  of  the  components  at  the  ends  of 
the  edge. 


6.3.2  MERGER  OP  COMPONENTS  WITH  SUBLINEARLY  RELATED  RANGE 

In  the  previous  subsection,  we  considered  merging  components  with 
distinguished  dimensions  Which  have  exactly  the  same  range  as  the  loop 
variable.  Every  node  is  then  executed  once  in  each  loop  instance. 

Then  is  a  large  class  of  cases  where  subscript  expressions  are 
explicitly  related,  i.e.  Where  we  use  an  indirect  subscript  X(I)  and  X 
is  a  function  of  I.  Statements  with  such  an  indirect  subscript  may  in 
same  case  be  conditionally  executed  in  the  scope  of  a  loop  for  the 
parameter  I.  We  will  require  that  the  indirect  subscript  expression 
X<l)  have  values  which  grow  monotonically  and  slower  than  that  of  the 
loop  variable  I.  This  feature  of  sub linearity  was  already  mentioned  in 
section  4.4.2.  As  explained  in  [PNPR  80],  use  of  indirect  subline ar 
subscript  is  important  in  many  instances,  such  as  selecting  a  subset  of 
records  from  a  sequential  file  or  merging  two  sequential  files  into  one. 

In  section  4.4.2  we  have  discussed  the  criterion  for  recognizing  a 
vector  Which  can  be  used  for  indirect  indexing.  The  values  of  elements 
of  an  indirect  indexing  vector  grow  slower  them  the  subscript  value  of 
the  elements.  The  range  of  its  dimension  will  be  called  here  the  major 
range.  While  the  range  of  its  content  will  be  called  subrange  relative 
to  the  major  range.  Por  example,  the  variable  X  in  Fig.  6.4  satisfies 
these  criteria.  X  is  used  in  the  subscript  expression  of  the  first 
dimension  of  node  A  and  therefore  R(<X,1>)  is  a  major  range  and  R(<A,1>) 
is  a  subrange  relative  to  R( <X, 1>). 


X(I)  -  If  1*1  THEN  1 

ELSE  IP  <condition  is  true>  THEN  X(I-1)+1 

ELSE  X(I-1)  t 

B(I)  -  A( X(  I  )  )  ; 


Fig.  6.4  Example  of  indirect  sublinear  indexing 
in  subscript  expression 


A  subrange  relative  to  a  major  range  may  be  the  major  range  of  some 
other  subranges.  Therefore,  the  sublinear  relationship  between  the 
ranges  may  form  a  tree  with  the  maximal  major  range  at  the  root.  We 
merge  major  ranges  and  subranges  in  a  bottom  up  order.  By  progressively 
merging  each  subrange  with  the  next  level  major  range  finally  we  will 
obtain  a  loop  Which  iterates  in  the  maximal  major  range,  and  Where  all 
of  its  subranges  a re  nested  inside  the  loop.  Such  merger  of  subranges 
may  not  always  be  possible.  Por  example,  if  type  4  subscript  expression 
is  used  in  the  distinguished  dimensions  of  a  component,  the  precedence 
relationship  will  prevent  us  from  scheduling  this  component  into  the 
scope  of  a  loop. 
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When  a  set  of  components  with  a  subrange  and  a  major  range  axe 
merged  into  the  scope  of  a  loop,  the  major  range  will  be  used  as  the 
loop  range  and  the  value  of  elements  of  the  indirect  indexing  vector 
will  be  checked  to  evaluate  only  the  elements  which  are  within  the 
subrange.  An  instance  of  the  subrange  is  executed  for  each  stepping  up 
by  1  of  the  indirect  indexing  vector.  The  computation  of  the  indirect 
index  should  precede  the  computation  of  any  node  within  the  subrange. 
This  introduces  an  additional  precedence  relationship. 

We  will  treat  subscript  expressions  of  types  5,  6,  and  7  similar  to 
types  1,  2,  and  3,  respectively,  in  checking  the  consistency  of 
subscript  expressions  of  the  distinguished  dimensions  as  discussed  in 
section  6.2.1.  If  a  check  of  the  subscript  expressions  of  the 
distinguished  dimensions  fails,  i.e.  some  type  4  subscript  expressions 
are  used  or  the  subscript  expressions  do  not  connect  distinguished 
dimensions  of  the  components,  we  will  treat  these  indirect  subscript 
expressions  of  type  5,  6,  and  7  as  type  4.  If  the  check  succeeds,  we 
will  add  edges  in  the  Array  Graph  from  the  indirect  indexing  vector  to 
the  nodes  referencing  it.  this  is  similar  to  the  addition  of  edges  from 
a  range  array  to  the  nodes  referencing  the  range  array. 


6.4  MEMORY  EFFICIENCY 

In  some  cases  the  same  memory  space  may  be  shared  by  a  number  of 
variables,  thereby  using  memory  storage  more  efficiently.  Small  savings 
of  memory  space  are  not  worth  the  cost  of  the  analysis.  For  example, 
sharing  memory  space  among  few  scalar  variables  does  not  save  much 
memory  space.  Our  approach  will  concentrate  on  having  elements  of  the 
same  array  share  the  memory  space.  Since  the  range  of  each  array 
dimension  is  in  general  large  and  there  are  several  dimensions,  the 
saving  should  be  considerable.  It  should  also  be  noted  that  memory 
space  is  statically  allocated  to  the  variables  in  the  produced  program. 
Compared  with  dynamic  memory  allocation,  static  memory  allocation  has 
the  advantages  of  simplifying  the  program  control  in  that  there  is  no 
need  to  allocate  memory  space  at  run  time.  This  also  facilitates 
efficient  random  access  of  array  elements. 


2. 


3. 


Three  alternative  approaches  to  allocating  memory  are  used: 

Physical  Dimension 

If  all  the  elements  along  some  array  dimension  have  different 
memory  spaces  assigned  to  them,  the  memory  space  allocated  is 
proportional  to  the  range  of  the  array  dimension.  This  method  of 
allocating  memory  will  be  referred  to  in  the  following  as  the 
physical  dimension. 

Virtural  Dimension 

If  all  the  elements  along  some  array  dimension  share  the  same 
memory  space,  a  single  element  memory  spue  serves  for  the  entire 
array  dimension.  We  will  refer  to  this  method  of  allocation  as 
virtual  dimension. 

Window  of  KidtT}  Jj 

In  some  cases  there  is  no  need  to  store  all  the  elements  in  an 
array  dimension  in  main  memory.  But  an  array  reference  of  the  form 
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A< I— k)  makes  it  necessary  to  keep  k+1  array  elements  in  main  memory 
at  any  momant.  This  typa  of  memory  allocation  will  ba  referred  to 
aa  window  of  width  k+l . 

For  a vary  array  dimension  we  have  to  decide  how  the  memory  space  is 
to  ba  allocated.  The  memory  allocation  decision  is  related  to  the 
program  execution  sequence.  Different  program  schedules  may  require 
different  memory  allocation  approaches.  For  example.  Fig.  6.5  shows  two 
different  schedules  for  copying  a  file.  The  one  Which  reads  all  the 
records  into  the  main  memory  then  writes  them  out  takes  more  memory 
space  than  the  other  one  which  copies  the  file,  record  by  record. 


(  <A,1>) 


Schedule-1 


Schedule-2 


DO  I  ; 

READ(ACI))  ; 
END  ; 

DO  I  ; 

BCD  =  A(  I)  ; 
END  j 

DO  I  ; 

WRITEC3C I) )  ; 
END  ; 


DO  I  ; 

READ(A(D)  ; 
B(I)  =  A( I)  ; 
WRITECB(I))  ; 
END  ; 


Fig.  6.5  Two  schedules  for  copying  a  file 


In  the  following  wa  will  show  how  tha  memory  allocation  decisions 
are  influenced  by  tha  program  schedule  and  how  the  mamory  space 
requirement  for  the  program  variables  is  evaluated. 
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6.4.1  EVALUATION  OP  MEMORY  USAGE 

We  will  first  consider  in  what  units  we  should  allocate  memory 
space.  If  a  data  structure  or  substructure  is  used  as  an  argument  of  a 
function  or  an  operation,  the  whole  structure  must  be  passed  between 
program  modules.  The  relative  position  of  its  constituent  elements 
becomes  important  to  the  computation.  Therefore  we  can  not  allocate 
memory  space  to  its  elements  separately.  On  the  other  hand,  economic 
allocation  of  memory  space  requires  that  the  unit  be  as  small  as 
possible.  We  will  require  that  all  the  operations  operate  on  fields. 
Operations  on  higher  level  structure  must  be  therefore  transformed  into 
operations  on  elementary  data  structure.  The  memory  space  will 
therefore  be  allocated  in  the  unit  of  fields. 

The  array  dimensions  above  the  unit  data  structure  will  be 
considered  as  logical  array  dimensions  for  which  there  may  not  be 
corresponding  physical  dimensions  in  the  allocated  memory  space.  One  of 
the  three  approaches  mentioned  above  may  be  used  to  allocate  memory 
space .  Since  a  virtual  dimension  requires  less  memory  space  than  a 
physical  dimension,  we  would  not  physically  allocate  memory  space  to  an 
array  dimension  unless  it  is  necessary  based  on  the  logic  of  the 
specification.  In  the  following  we  will  discuss  the  conditions  when  an 
array  dimension  has  to  be  physical  or  window  of  width  k. 

The  values  of  data  structures  may  be  produced  by  some  program 
activities  such  as  reading  an  input  file  or  evaluating  an  expression, 
and  consumed  by  some  other  activities  such  as  writing  an  output  file  or 
referencing  an  expression.  If  the  production  and  consumption  of  the 
elements  along  an  array  dimension  does  not  proceed  in  a  planned  order 
then  all  the  array  elements  that  are  produced  can  not  be  discarded.  All 
must  be  stored  simultaneously  in  main  memory. 

Given  a  program  schedule  we  can  check  whether  the  program 
activities  which  produce  or  consume  the  values  along  an  array  dimension 
are  all  in  one  loop.  If  not,  that  array  dimension  should  be  a  physical 
dimension.  If  all  the  definitions  and  references  of  an  array  are  in  the 
same  loop,  we  should  further  check  whether  any  type  2  or  3  subscript 
expressions  are  used,  because  the  occurrence  of  I-k  type  subscript 
implies  the  necessity  of  keeping  previous  k  elements  while  computing  a 
new  array  element.  Thus  the  memory  space  for  the  array  dimension  should 
be  a  window  of  width  k+1 .  It  should  be  noted  that  if  an  array  has  its 
distinguished  dimension  using  either  a  finite  window  or  a  physical 
dimension  memory  allocation  scheme,  all  the  loop  for  array  dimensions 
which  are  scheduled  nested  inside  the  current  loop  have  to  be  of 
physical  dimensions.  This  is  illustrated  in  Pig.  6.6,  where  a  two 
dimensional  array  A  is  computed  by  a  nested  loop.  Suppose  the  outer 
loop  iterates  over  the  first  dimension  of  A,  i.e.  <A,1>.  The  presence 
of  subscript  expression  I-l  requires  a  memory  allocation  scheme  of 
window  of  width  two  for  <A,1>  dimension.  Since  the  array  element  of  A 
is  computed  row  by  row  and  the  computation  of  array  elements  in  one  row 
depends  on  the  value  of  array  elements  in  the  previous  row,  therefore, 
we  will  have  to  allocate  two  rows  of  memory  space  for  array  A. 
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als  A<I,J)  =  IF  1=1  THEN  f(J) 

ELSE  g(A(I-l),J>  ; 

(a)  W3DEL  specification 

DO  I  ;  Array  A 

DO  J  ; 

al(I,J)  ; 

END  ; 

END  ; 

(b)  Schedule  (c)  Memory  requirement 

Fig.  6.6  Effect  of  window  dimension  on  the  outer  loop 
over  dimensions  on  the  inner  loops 


After  the  memory  allocation  approach  for  every  array  dimension  has 
been  determined,  we  can  estimate  the  memory  space  requirement,  which 
will  serve  as  a  measure  of  the  program  quality.  Given  an  N  dimensional 
array  A,  we  can  define  the  required  memory  space  M  for  a  node  subscript 
<A,i>  as  follows. 

M(  <A,  i>  )  »  l  if  the  ith  dimension  is  virtual, 

-  k  if  using  window  of  width  k, 

*  upper  bound  of  R( <A, i> )  if  physical. 

If  an  array  dimension  is  not  physical,  the  upper  bound  of  its  range  is 
not  used  in  calculating  the  memory  requirement.  The  upper  bound  is 
needed  to  estimate  the  memory  space  for  a  physical  dimension.  Sometimes 
the  range  of  an  array  dimension  is  specified  by  an  assertion  and  the 
upper  bound  is  not  known  until  run  time.  in  that  case  we  can  only 
assume  the  upper  bound  is  infinity  unless  the  user  has  specified  an 
upper  bound  of  the  range  in  the  data  description  statements.  The  memory 
space  for  array  A  is  the  product  of  K(<A,i>)'s  for  all  the  dimensions  of 
A.  The  total  memory  requirement  of  a  program  is  the  sum  of  the  memory 
space  used  by  every  array  variable. 
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6.4.2  MEMORY  PENALTY 


Analysis  of  the  loop  scope  leads  to  the  selection  of  the  memory 
allocation  scheme  for  the  respective  array  dimension.  The  memory 
penalty  of  a  loop  is  defined  as  the  memory  cost  of  the  arrays  included 
in  the  loop  scope.  The  memory  cost  is  the  difference  in  memory 
requirements  between  the  ideal  case  (virtual  dimension)  and  the  memory 
requirements  if  the  loop  is  formed,  in  order  to  evaluate  the  memory 

penalty  of  a  loop,  we  first  find  all  the  nodes  whose  memory  allocation 

scheme  is  influenced  by  the  construction  of  the  considered  loop. 

Whenever  an  Array  Graph  edge  crosses  the  loop  boundary,  a  source  or 

target  node  of  the  nodes  in  the  loop  will  be  outside  of  the  loop. 

Either  one  of  the  two  nodes  may  require  using  the  physical  memory 
allocation  scheme.  For  example,  if  an  edge  from  a  data  node  to  an 
assertion  node  crosses  the  loop  boundary,  (i.e.  the  data  node  is  in  the 
scope  of  the  loop  While  the  assertion  node  is  outside),  the  data  node  is 
defined  in  one  loop  and  referenced  outside  it.  Therefore,  its  array 
dimensions  have  to  be  physical.  Similarly  if  the  edge  crossing  the  loop 
boundary  is  from  am  assertion  node  to  a  data  node,  the  dimension  of  the 
target  node  has  to  be  physical. 

Each  node  under  consideration  may  fall  into  one  of  the  following 
three  categories  and  the  memory  penalty  cam  be  computed  accordingly. 

1.  A  physical  dimension  for  a  distinguished  dimension.  This  category  is 
recognized  by  the  existence  of  am  edge  which  crosses  a  loop  boundary. 
The  memory  requirement  in  ideal  case  is  taken  as  that  of  a  virtual 
dimension.  The  memory  requirement  for  a  loop  is  computed  by 
multiplying  the  upper  bounds  of  all  the  unscheduled  dimensions  and 
the  dimension  that  is  considered  for  a  loop.  The  difference  is  the 
penalty  of  the  loop  for  this  array. 

2.  A  virtual  dimension  for  the  distinguished  dimension.  In  this  came 
the  loop  boundary  is  not  crossed  by  edges  and  all  the  subscript 
expressions  on  its  distinguished  dimension  are  type  1  subscripts. 
The  memory  penalty  for  a  virtual  dimension  should  be  zero. 

3.  A  window  of  width  k+1  for  the  distinguished  dimension.  Similau:  to 
the  virtual  dimension  category.  No  edges  would  cross  the  loop 
boundary.  However  subscript  expressions  of  the  form  I-k  on  its 
distinguished  dimension  aure  allowed.  The  other  unscheduled 
dimensions  aure  considered  to  be  physical  dimensions.  The  penalty  is 
computed  similau:  to  the  first  category. 

Example  Consider  the  memory  penalty  of  a  loop  shown  in  Fig.  6.7.  The 
ranges  of  subscripts  I  and  J  aure  10  and  20  respectively,  and  every 
data  element  occupies  one  unit  of  memory  space.  The  memory 
requirements  in  ideal  causes  for  node  A,  B,  C,  and  D  aure  1,  1,  1, 
and  i  respectively.  The  memory  requirements  if  the  loop  is  formed 
will  be  10,  40,  1,  and  200  respectively.  Arrays  A  and  D  have  to  be 
physical  and  the  first  dimension  of  array  B  needs  a  window  of  width 
2.  The  memory  penalty  for  this  loop  is  the  difference  of  251  and 
4,  i.e.  247  units  of  memory  space. 


loop  on  I 
I - 
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CD 
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MPC  A) 


MP(  B) 


KP(  C) 


K?(D) 


10  -  1  =  9 


2*  20  -1*1=  39 


1*1-  1*1  =  0 


10  *  20  -  1  *  1  =  199 


Pig.  6.7  Example  of  computing  memory  penalty 


Information  about  the  unscheduled  dimensions  may  be  used  to  compute 
the  penalty  more  accurately.  For  example,  some  array  dimensions  must  be 
physical  dimensions  because  of  the  use  of  type  4  subscript  expressions. 
Ouring  the  process  of  scheduling,  we  can  accumulate  such  information  to 
speed  up  the  memory  penalty  evaluations. 


6.5  A  HEURISTIC  APPROACH  TO  MEMORY-EFFICIENT  SCHEDULING 

In  general,  there  is  a  large  number  of  schedules  which  can  realize 
the  computation  of  a  program  specification.  The  schedule  with  the 
minimal  total  memory  requirement  will  be  called  an  absolute  optimal 
program.  In  principle  it  should  be  possible  to  enumerate  all  the 
possible  schedules  for  an  Array  Graph,  as  there  is  a  finite  number  of 
them,  and  then  evaluate  the  memory  requirement  of  each  schedule.  Me 
would  thus  be  able  to  find  the  absolute  optimal  schedule.  For  several 
reasons  this  method  is  not  practical.  The  program  events  being 
scheduled  are  low  level  activities  represented  by  nodes,  i.e. 
statements  and  variables,  and  an  Array  Graph  may  easily  consists  of 
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several  hundred  or  even  thousands  of  nodes.  Also  the  nodes  In  the  Array 
Graph  nay  be  multi-dimensional  and  the  number  of  combinations  of 
possible  nested  loops  is  very  large.  Further,  the  constraints  on  the 
feasible  schedules  are  complicated.  Thus  enumerating  all  the  feasible 
schedules  would  be  prohibitive,  and  an  exhaustive  examination  of  all  the 
feasible  schedules  to  find  the  absolute  optimum  is  not  acceptable. 

Instead  we  have  adopted  the  heuristic  approach  as  follow.  Given 
an  Array  Graph  as  input,  we  first  construct  an  acyclic  component  graph 
with  the  MSCCs  in  the  Array  Graph  as  nodes.  Our  objective  is  to 
repeatedly  merge  components  in  the  component  graph  into  blocks  which 
correspond  to  loop  scopes.  This  process  will  be  applied  repeatedly  to 
the  levels  of  nested  loops.  On  the  first  application  it  will  produce 
the  outer  level  loops.  The  blocks  are  formed  by  merging  as  many 
components  as  possible  which  have  the  same  or  related  ranges.  The 
process  is  repeated  for  each  lower  level  of  the  nested  loops,  based  on 
the  subgraph  that  corresponds  to  the  higher  level  loop.  This  process 
may  not  result  in  the  absolute  optimal  program  as  the  outer  level  loop 
scopes  are  determined  without  the  analysis  of  the  effects  of  inner  loop 
structures  on  the  use  of  memory  space.  However  considering  the  effect 
of  inner  loops  on  memory  usage  is  a  complex  process  and  it  represents  a 
large  increase  in  the  number  of  alternatives  that  must  be  evaluated. 
The  scope  of  the  major  loops  in  a  program  acre  maximized  in  our  proposed 
approach  and  there  is  no,  or  little,  effect  of  inner  loops  on  memory 
usage.  Thus  this  heuristic  approach  represents  a  good  compromise 
between  the  amount  of  analysis  involved  and  the  payoff  in  reducing 
memory  usage. 

On  each  level  of  loops,  the  scheduling  process  consists  of  a  trial 
scheduling  for  every  range  set  in  the  corresponding  Component  Graph.  A 
loop  for  the  range  R  will  enclose  only  the  components  which  have 
dimensions  in  the  range  set  associated  with  range  R.  The  range  sets 
related  to  R  (through  sublinear  indirect  indexes)  will  later  be  merged 
with  the  blocks  of  range  R.  The  maximum  loop  scope  for  every  range  R  is 
the  range  set  of  R. 

The  trial  scheduling  of  each  range  set  consists  of  finding  the 
closure  of  the  range  set  and  an  attempt  to  schedule  nodes  in  the  set 
which  may  be  within  the  scope  of  the  respective  loop.  We  first  merge 
into  a  block  the  components  in  the  range  set  which  do  not  have  any 
predecessors  in  the  closure  of  the  range  set.  Progressively  we  will 
merge  into  the  block  other  components  which  depend  on  those  in  the 
block,  as  far  as  possible.  The  merger  involves  selection  of  a 
distinguished  dimension  in  each  component ,  as  described  above.  At  the 
end  we  evaluate  the  memory  penalty  of  the  loop  scope  obtained  by  the 
trial  scheduling.  The  loop  with  the  smallest  penalty  will  be  scheduled 
finally.  This  process  will  be  repeated  with  the  unscheduled  portion  of 
the  graph  until  all  the  components  in  the  Component  Graph  are  scheduled. 

There  are  many  possible  orders  for  merging  components  in  the 
closure  of  a  range  set,  to  form  the  scope  of  a  loop.  For  example,  we 
may  arbitrarily  pick  a  component  in  the  middle  of  the  component  Graph 
and  merge  it  with  its  neighbor  components  or  start  with  a  component  on 
which  no  other  components  depend  and  merge  the  components  backward. 
However,  considering  all  the  possible  orders  of  mergers  will  further 
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increase  the  number  of  alternatives  that  oust  be  evaluated.  The  order 
of  mergers  is  unimportant  in  the  case  Where  the  whole  range  set  can  be 
scheduled  in  one  loop,  i.e.  it  is  the  case  that  all  the  array 
dimensions  may  become  virtual.  No  matter  in  what  order  we  merge  the 
components,  we  will  finally  get  the  same  loop  scope.  Again,  we  selected 
the  forward  merging  of  the  Component  Graph  as  a  good  compromise  between 
quality  of  the  schedule  and  the  amount  of  analysis. 

It  is  necessary  next  to  order  the  blocks  associated  with  outside 
level  loops  in  an  execution  sequence  order.  The  memory  cost  will  be  the 
same  for  any  order  that  maintains  the  precedence  relations  between  these 
blocks.  we  Choose  to  order  the  blocks  by  topological  sorting.  For 
every  outer  level  loop  we  mark  the  distinguished  dimensions  of  the 
blocks  as  scheduled. 

We  apply  the  scheduling  algorithm  recursively  to  each  inner  nested 
level  loop  by  considering  only  the  subgraph  which  contains  the  nodes  in 
one  loop  scope.  The  resulting  schedule  will  be  the  body  of  the  outer 
level  loop. 

We  will  illustrate  this  process  with  an  example  of  scheduling  the 
Array  Graph  shown  in  Pig.  6.8.  Every  node  is  a  Msec  by  itself,  and  the 
initial  Component  Graph  is  in  fact  the  Array  Graph.  The  candidate 
ranges  axe  R(<A,1>)  and  R(<B,1>).  Assume  that  the  repetition  numbers 
are  500  and  200,  respectively.  The  range  set  of  R(<A,1>)  contains  three 
nodest  A,  al,  and  C.  The  closure  of  (A,  al,  C)  is  itself.  If  we 
schedule  the  whole  set  into  one  loop,  the  penalty  will  be  making  array  B 
physical.  On  the  other  hand,  the  trial  scheduling  of  the  range  set  of 
R( «B, 1» )  contains  two  nodes  *  B  and  al .  If  this  set  is  scehduled  in  one 
loop,  the  penalty  will  be  making  both  array  A  and  C  physical.  We  will 
select  the  loop  of  R(<B,l>)  since  the  size  of  array  b  is  greater  than 
the  sum  of  the  sizes  of  array  A  and  C.  We  mark  the  component  B  and  al 
as  scheduled.  There  are  two  components  left  to  be  scheduled.  We  have 
no  alternative  but  to  schedule  each  of  them  in  a  separate  loop.  The 
resulting  schedule  is  shown  in  Fig.  6.8(b). 
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6.6  THE  SCHEDULING  ALGORITHM 


The  scheduling  algorithm,  called  SCHEDULE,  is  documented  below. 
The  overall  process  is  illustrated  in  Pig.  6.9.  The  solid  lines  show 
procedure  calls  and  the  dashed  lines  show  passing  of  parameters  and 
returns.  The  SCHEDULE  process  starts  with  construction  of  a  reduced 
form  of  the  Array  Graph,  which  will  be  modified  in  the  course  of 
scheduling  and  is  also  easier  to  manipulate,  it  then  calls  a  recursive 
procedure  SCHEDULE_GRAPH .  This  procedure  accepts  an  Array  Graph  as 
input  and  returns  a  schedule  as  output.  SCHED(JLE_GRAPH  calls  on  a 
number  of  procedures  to  perform  its  tasks.  It  calls  first  the  procedure 
STRONG  to  construct  a  Component  Graph  out  of  the  reduced  Array  Graph  (or 
subgraphs  of  it  in  recursive  calls). 

Next,  the  major  iteration  in  SCHEDULE_GRAPH  schedules  the  outer 
loop  scopes .  This  iteration  repeats  until  all  the  components  in  the 
Component  Graph  have  been  scheduled.  This  major  iteration  loop  finds 
first  all  the  candidate  ranges. 

Next  there  is  a  nested  iteration  for  trial  scheduling  of  all  the 
candidates  ranges.  It  consists  of  calls  to  four  procedures.  Procedure 
INDRSUB  is  called  first  to  find  the  range  sets  of  each  candidate  range. 
If  a  candidate  range  has  some  subranges  related  to  it,  the  sets  of  the 
subranges  will  also  be  included  in  the  major  range  set.  CLOSURE  is  then 
called  to  get  the  subgraph  for  the  closure  of  the  range  set.  Then 
MAX-SCHED  is  called  to  do  a  trial  scheduling.  MAX_SCHED  accepts  as 
input  a  subgraph  which  consists  of  the  closure  of  a  respective  range  set 
and  returns  as  output  a  loop  scope  which  contains  components  in  the 
closure  of  the  range  set  that  have  been  trial  scheduled.  The  trial 
scheduling  consists  of  repeated  mergers  into  a  loop  scope  of  the 
components  in  the  closure  of  the  range  set  which  do  not  depend  on  any 
other  components.  As  a  component  is  merged  into  the  loop  scope,  it  is 
deleted  from  the  subgraph  of  closure  of  the  range  set.  The  merger 
repeats  until  no  more  components  can  be  scheduled.  Procedure  EVALUATE 
is  then  called  to  compute  the  memory  penalty  associated  with  the  loop 
scope. 


At  the  end  of  the  nested  iterations  for  all  the  candidate  ranges, 
SCHEDULE_GRAPH  selects  the  loop  scope  with  the  smallest  penalty.  It 
will  eventually  form  a  part  of  the  final  schedule.  The  components  in 
the  selected  loop  scope  are  first  merged  into  a  single  component  and 
then  marked  off  in  the  Component  Graph. 

The  above  major  iteration  loop  is  repeated,  as  noted  above,  until 
the  Component  Graph  is  empty.  The  outer  loop  scopes  are  thus  all  found. 
The  corresponding  components  axe  topologically  sorted.  It  is  necessary 
then  to  find  the  nested  loop  scopes,  if  any,  for  each  outer  loop  scope 
subgraph.  As  SCHEDUI£_GRAPH  selects  the  next  component  in  the 
topological  sorting,  it  calls  the  procedure  EXTRACT  to  extract  these 
subgraphs,  which  correspond  to  the  selected  loop  scopes.  Each  of  these 
subgraphs  must  be  internally  scheduled.  EXTRACT  calls  SCHEDULE_GRAPH 
recursively,  to  schedule  each  of  the  subgraphs.  A  component  that  is  not 
within  a  loop  scope  needs  not  be  further  internally  scheduled. 


Fig.  6.9  Various  components  of  the  scheduling 
algorithm 


Global  Data  Structure  for  SCHEDULE 

The  reduced  form  Array  Graph,  constructed  by  the  SCHEDULE  procedure, 
consists  of  a  list  of  elements  of  type  GNODE,  with  the  following  fields: 
NXT_GNODE  -  a  pointer  to  the  next  element  in  the  list.  (At  the 
generation  of  the  reduced  form  Array  Graph  all  the  GNODEs 
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form  a  single  list.  During  the  process  separate  lists 
will  link  the  GNOOEs  in  each  MSCC.  ) 

NODE_ID  -  The  node  number  of  the  element  in  the  dictionary. 

SUXL  -  A  pointer  to  a  list  of  edges  connecting  this  element  to 

its  successors.  Initially  this  is  identical  to  the 
SUCC_LIST  list.  As  the  process  proceeds,  some  of  the 
edges  are  removed  from  this  list. 

The  components  in  the  reduced  Array  Graph  are  found  by  the  procedure 
STRONG.  STRONG  modifies  the  list  connecting  the  nodes  in  the  Array 
Graph  to  form  separate  lists  for  each  MSCC. 

The  initial  number  of  components  in  a  Component  Graph  is  denoted  as 
COMP_CNT.  Every  component  is  assigned  a  component  number  from  one  to 
COMP_CNT.  The  component  graph  is  defined  in  the  following  four  vectors. 

1)  NODELST( COMP_CNT ) .  Points  to  a  list  of  GNODE  elements  in  the  Array 
Graph  Which  belong  to  the  respective  component. 

2)  ACOMP( COMP_CNT ) .  A  boolean  value  showing  whether  the  component 
exists  in  the  component  graph  or  not.  In  the  course  of  the  process, 
when  a  component  is  merged  into  some  other  component,  its 
corresponding  ACOMP  bit  is  reset. 

3 )  INCMP( COMP_CNT ) .  A  boolean  value  showing  whether  a  component  has 
been  scheduled  or  not.  Once  a  component  has  been  scheduled,  its 
corresponding  bit  will  be  reset.  Thereby  it  will  not  be  scheduled 
again. 

4)  CEDGES( COMP_CNT ) .  Points  to  a  list  of  edges  which  originate  from  the 
component  and  end  at  its  successor  components.  Every  element  in  the 
list  has  two  fields.  One  field  contains  the  component  number  of  its 
successor  and  the  other  is  a  pointer  which  points  to  the  next  edge. 

A  subgraph  of  the  Component  Graph  can  be  represented  by  a  bit  vector 
like  INCMP.  If  a  component  is  in  the  subgraph,  its  corresponding  bit 
will  be  set.  Otherwise,  the  corresponding  bit  will  be  reset.  In  the 
following,  all  the  subgraphs  of  the  Component  Graph  will  use  this 
representation . 

The  finally  generated  program  schedule  is  structured  as  a  list  of 
schedule  elements.  There  are  four  types  of  schedule  elements: 
node-element,  for-element,  simul-e lament,  and  cond-element .  A 
node-element  corresponds  to  a  primitive  program  event  in  the  generated 
program  such  as  the  computation  of  an  assertion,  opening  a  file,  reading 
a  record.  A  for-element  corresponds  to  a  loop  in  the  program.  The  body 
of  the  loop  is  also  represented  by  a  schedule  list  and  pointed  to  from 
the  for-element.  Similarly,  a  simul-e lament  corresponds  to  an  iterative 
computation  for  a  simultaneous  block  and  points  to  a  list  in  the  body  of 
the  iteration.  The  cond-element  is  used  to  represent  a  conditionally 
executed  block  which  corresponds  to  the  scope  of  a  subrange.  It  will 
point  to  the  respective  body  list. 

1)  A  node-element  is  a  structure  NELMNT,  with  the  following  fields: 

NXTJNIMN  -  Pointer  to  the  next  element  in  the  schedule. 

NIMN.TTPE  -  Equal  to  1,  denoting  this  is  a  node-element. 

NODES  -  The  node  number .  s 

2)  A  for-element  is  a  structure  FEIXNT,  with  the  following  fields: 

NXT_FLMN  -  Pointer  to  the  next  element  in  the  schedule. 

FUMJTCPE  -  Equal  to  2,  denoting  this  is  a  for-element. 

EIM4T_XjXST—  Pointer  to  a  program  schedule  which  is  the  body  of  the 
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loop. 

FORENAME  -  The  dictionary  node  number  of  the  loop  variable. 

FOR^RANGE  -  The  dictionary  node  number  where  the  range  of  the  loop 
variable  is  specified. 

3 )  A  simul-element  ia  a  structure  SEUMT  which  is  used  for  a 
simultaneous  equation  block.  It  has  the  same  structure  as  peumt 
with  FLMN_TYPE  equal  to  3. 

4)  A  co nd -element  is  used  for  a  conditionally  executed  block.  It  has  a 
similar  data  structure  as  FEIMNT  except  that  the  field  FIJe*_TYPE  is 
always  equal  to  4. 


Algorithm  6.1  SCHEDULE_GRAPH 
Input. 

Gs  A  pointer  to  the  reduced  Array  Graph  which  is  represented  by  a 
GNODE  list. 

I>:  The  nesting  level  I». 

Output. 

A  program  schedule  for  the  input  graph  G. 

Data  Structures. 

GSIZE( COMP_CNT ) i  The  number  of  nodes  in  a  component. 

MINFREE( COMP_CNT ) s  The  minimum  of  the  number  of  unscheduled 
dimensions  associated  with  any  node  in  a  component. 

SUBRNGR(  $RNG_SET,  $RNG_SET ) :  A  boolean  matrix  which  shows  the 
subrange  relationships.  If  the  jth  range  set  is  a  subrange  of 
the  ith  range  set,  then  SUBRNGR(i,j)  will  be  set  to  ’ 1 'B. 
RNG_VEC( $RNG_SET ) :  For  each  range  set,  it  indicates  the  node  number 
of  the  indirect  indexing  vector  which  reduces  the  major  range 
into  this  range  set,  if  any. 

1.  Call  procedure  STRONG  to  find  out  all  the  MSCCs  in  the  Array  Graph  G 
and  then  construct  a  Component  Graph  with  each  MSCC  as  a  node. 
Initially  all  the  components  are  put  in  the  Component  Graph  and  the 
corresponding  ACOMP  and  INCMP  bits  axe  set  to  'l'B. 

2.  For  each  component,  compute  the  corresponding  element  of  the  vector 
GSIZE,  which  is  the  number  of  nodes  in  the  component,  and  the 
corresponding  element  in  the  vector  MINFREE,  which  is  the  minimum  of 
the  number  of  unscheduled  dimensions  associated  with  any  node  in  the 
component.  Also  compute  the  SUBRNGR  matrix  by  scanning  the  indirect 
subscript  expressions  used  in  the  assertions,  and  the  vector  RNG_VEC 
which  gives  for  each  range  set  number  the  node  number  of  the 
indirect  subscript,  if  any. 

3.  If  a  component  has  MINFREE^O,  it  is  not  to  be  scheduled  in  any  loop, 
we  will  mark  it  off  from  the  Component  Graph  by  setting  the 
corresponding  INCMP  bit  to  'O'B.  This  component  will  be  a  single 
component  block. 

4.  Repeat  step  5  to  11  to  schedule  all  the  outer  level  loops,  until  all 
components  in  the  Component  Graph  have  been  marked  off. 

5.  Select  the  ranges  of  node  dimensions  which  axe  not  yet  scheduled  and 
where  the  respective  range  does  not  have  real  arguments  of 
unscheduled  subscripts.  The  selected  ranges  can  be  scheduled  in  the 
outer  level  loops.  The  ranges  of  those  node  dimensions  will  be  the 
candidate  ranges. 

6.  Repeat  step  7  to  10  for  each  range  candidate .  steps  7  to  10  consist 
of  a  trial  scheduling  of  a  range  candidate  Ri. 

7.  Call  procedure  INDRSUB.  This  procedure  computes  a  subgraph  S  which 
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contains  all  the  components  which  are  in  the  range  set  of  Ri  or  the 
range  set  of  a  subrange  of  Ri.  S  is  represented  as  a  bit  map 
similar  to  INCMP. 

8.  Call  procedure  CLOSURE  to  find  the  subgraph  S'*closure(S). 

9.  Call  procedure  MAX_SCHED  with  subgraph  S'  and  range  candidate  Ri  as 
input  parameters  to  form  a  loop  scope  Li  which  contains  a  subgraph 
of  S' .  Li  is  represented  as  a  bit  map  similar  to  INCMP. 

10.  Call  procedure  EVALUATE  to  compute  the  memory  penalty  of  Li. 

11.  Choose  the  loop  Lj  with  the  smallest  memory  penalty.  Merge  all  the 
components  in  Lj  into  one  component,  say  Ck,  by  modifying  the  list 
pointed  to  by  the  NODELST  of  Ck  to  include  all  the  GNODEs  in  the 
other  merged  components.  ACOMP,  INCMP,  and  CEDGES  vectors  are  also 
modified  to  reflect  the  new  component.  Then  set  INCMP(k)  to  'O'B  to 
mark  the  whole  loop  scope  off  from  the  Component  Graph. 

12.  Do  a  topological  sort  over  the  resulting  components  of  the  component 
graph  where  each  component  corresponds  to  either  a  single  node  or  a 
loop  scope  in  the  schedule  to  be  returned. 

13.  Schedule  each  component  separately.  If  there  is  no  distinguished 
dimension  for  the  nodes  in  a  merged  component,  a  node-element  will 
be  formed  for  the  component.  Otherwise,  call  the  procedure  EXTRACT 
to  form  a  for-element  for  the  component . 

Algorithm  6.2  STRONG 

Input. 

G:  A  pointer  to  an  Array  Graph. 

Output. 

NODELST:  A  list  of  components  which  are  the  MSCCs  of  the  input 
graph.  Every  component  is  represented  by  a  list  of  GNODE 
elements  which  belong  to  the  component. 

1.  Clear  the  stack,  the  component  count,  the  list  of  components 
NODELST,  and  the  variable  COUNT.  For  each  node  v  in  the  graph  G  set 
DFNUMBER(V)  -  0 

2.  For  each  node  v  in  the  graph  G  such  that  DFNUMBER(  v  )-o  call  SEARCH(v) 
to  add  the  components  reachable  from  v  to  the  component  list  NODELST. 

3.  Return  the  component  list  as  the  result. 

Algorithm  6.3  SEARCH 

Input. 

vt  A  node  in  a  graph  which  is  not  examined  yet. 

Output. 

The  NODELST  for  all  the  MSCCs  reachable  from  node  v. 

1.  Set  COUNT  to  COUNT+l  and  DPNUMBER( v ) ,  LOWLINK(v)  to  COUNT.  Push  v 
on  the  stack. 

2.  Repeat  the  following  substeps  for  each  node  w,  a  direct  descendant 
of  v. 

2.1  If  DFNUMBER{ w )*0 ,  call  SEARCH(w)  and  then  let 

L0HLINX( v )-min( L0MLINK( v ) , L0WLINK( w ) ) . 

2.2.  Else,  if  DFNUMBER(w)>0  and  w  is  on  the  stack,  then  let 
L0WLINK(  v  )-min(  DFNUMBER(  w  ) ,  L0WLINK(  v  )  ) . 

3.  If  L0NLZNK( v)<DFNUMBER( v)  then  return. 

4.  Else,  L0WLZNK( v )>DFNUMBER( v ) .  Node  v  is  a  root  of  a  strongly 
connected  component.  All  the  elements  (above  and  including  v)  on 
the  stack  are  successively  popped  off  the  stack  and  linked  into  a 
list  -  a  subgraph  which  is  defined  as  a  component.  This  component 
is  placed  on  the  top  of  a  list  of  components  pointed  to  by  the 
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variable  OOMP_LXST.  In  addition  a  unique  component  number  is 
assigned  to  each  node  w  in  the  current  component . 

algorithm  6.4  INDRSUB( RANGE , GI ) 

Input. 

RANGE i  A  candidate  range  (a  range  set  number). 

Output. 

GIi  A  subgraph  which  contains  all  the  components  in  the  range  set  of 
RANGE  and  the  components  in  the  range  sets  of  the  subranges  of 
RANGE  Which  can  be  included  in  the  loop  scope  of  RANGE. 

1.  Construct  a  subgraph  GI  which  contains  all  the  components  in  the 
Component  Graph  Which  have  an  unscheduled  dimension  with  the  range 
RANGE.  GI  is  represented  in  a  bit  vector  similar  to  INCMP.  Set 
GI( k )“ ’ 1 ' B  if  the  kth  component  is  in  the  range  set  of  RANGE.  The 
edges  from  these  nodes  are  given  in  CEDGES. 

2.  If  RANGE  has  no  subranges,  return  GI  as  the  result.  This 
information  stored  previously  in  SUBRNGR  matrix,  which  shows  the 
subrange  relationships. 

3.  Otherwise,  repeat  step  S  to  8  for  each  immediate  subrange  RNGIK  of 
RANGE. 

4.  Call  INDRSUB  recursively  with  RNGIK  as  input  parameter  and  GIK  as 
the  output  parameter.  GIK  will  contain  the  components  Which  can  be 
scheduled  in  the  loop  of  RNGIK. 

5.  Call  procedure  CLOSURE  to  compute  the  closure  of  GIK  in  the 
Component  Graph,  then  put  the  closure  into  GIK. 

6.  Set  the  union  of  GI  and  GIK  into  GI.  (Note  that  this  may  be 
reversed  in  step  8 . ) 

7.  Call  MAX_SCHED  procedure  to  do  a  trial  scheduling  for  subgraph  GI. 

8.  If  the  subgrpah  GI  can  not  be  scheduled  completely,  then  at  least 
one  node,  and  possibly  more,  will  have  to  be  physical.  Also  the 
range  specification  of  the  subrange  may  become  necessary,  therefore 
we  decided  that  in  this  case  it  is  not  worthwhile  to  merge  the  range 
set  of  RNGIK  with  the  range  set  of  RANGE  and  GIK  is  taken  out  of  GI. 

9.  Return  GI  as  the  result. 

Algorithm  6.5  CIOSURE( COMPS ) 

Input. 

COMPS( COMP_CNT ) j  A  bit  vector  with  a  set  of  components  marked  by 
*1*8.  Other  components  are  marked  by  '0*B. 

The  algorithm  also  uses  the  global  data  structures  (ACOMP  and 
CEDGES). 

Output. 

CCOMPS:  A  bit  vector  with  the  closure  of  the  set  of  components  in 
the  input  marked  by  ' 1 ' B .  other  components  are  marked  by  'O'B. 

1.  Create  a  bit  vector  NACOMP  (sise  COMP_CNT)  with  the  components  in 
ACOMP  marked  except  the  components  in  COMPS  are  merged  into  one 
component.  This  also  involves  creating  a  vector  NCEDGES  similar  to 
CEDGES  except  reflecting  the  merger  of  the  components  in  COMPS. 

2.  Find  all  the  MSCCs  in  the  new  component  graph  (consisting  of  the  new 
vectors  NACOMP  and  NCEDGES ). 

3.  Locate  the  MSCC  Which  includes  the  components  in  COMPS. 

4.  Construct  CCOMPS,  a  bit  vector  (size  COMP.CNT),  with  all  the 
components  in  the  MSCC  marked.  This  is  the  closure  set  of  the 


Algorithm  6.6  MA*_SCHED 
Input. 

INCMPt  A  bit  vector  where  a  set  of  yet  unscheduled  components  is 
marked  by  'l'B.  Other  scheduled  components  have  a  value  *0'B. 
Note  that  these  unscheduled  components  are  the  basic  MSCCs  found 
by  STRONG.  The  function  of  MA3Q_SCHED  is  to  schedule  as  many  of 
the  marked  components  as  possible. 

MERGCMP  t  A  bit  vector  with  the  closure  of  a  range  set  marked  by 
•l'B. 

RANGE:  The  candidate  range  ( range  set  number). 

Output. 

COMPS i  A  bit  vector  with  the  components,  which  have  been  trial 
scheduled  in  a  loop,  marked  by  'l’B. 

POSITION:  A  vector  (size  is  DICTIND-  the  number  of  nodes  in  the 
dictionary).  The  position  in  each  scheduled  node  of  the 
distinguished  dimensions  that  corresponds  to  the  loop  parameter. 

1.  Initialize  the  POSITION  entries  to  0. 

2.  For  each  component  i,  if  INCMP( i )- • 1 • B  (i.e.  it  is  not  yet 

scheduled),  MERGCMP( i )- *  1 • B  (i.e.  it  is  in  the  closure  set),  then 
search  the  ™cks  vector  and  set  PREDCNT(  i )  to  number  of 

predecessors  in  MERGCMP.  If  PR£DCNT( i )-0  then  put  component  i  into 
a  list  of  candidates  to  be  trial  scheduled. 

3.  Repeat  steps  4  to  8  until  the  list  (referred  to  in  step  2)  is  empty. 
The  function  of  steps  4  to  8  is  to  merge  one  component  from  the  list 
into  the  loop  scope  represented  by  COMPS. 

4.  Remove  a  component,  say  Ci,  from  the  list.  Search  through  the 

NODELST  of  Ci,  if  there  exists  a  node  v  with  POSlTION(v)>0  (i.e. 
its  distinguished  dimension  has  been  determined  in  a  previous 
iteration),  then  set  FIRSTNODE=v,  and  go  to  step  7. 

5.  Else,  arbitrarily  pick  any  node  of  the  component,  bet  it  be  denoted 
by  v.  Set  FIRSTNODB-v. 

6.  Search  the  subscript  list  of  node  v  until  finding  a  dimension  j  that 

has  not  been  scheduled  in  a  loop  scope  (i.e.  IDWITH~0)  and  its 
range  is  the  same  as  the  RANGE  parameter.  If  found,  then 

POSITION( v )•* j .  if  none  found  then  this  component  should  not  be 

scheduled  in  the  loop  scope.  Therefore  go  to  next  iteration  (i.e. 
end  of  step  9). 

7.  Propagate  the  distinguished  dimension  of  node  v  repeat ly  until  all 
the  nodes  in  Ci  have  their  distinguished  dimensions  defined.  During 
each  propagation  step: 

7.1  Propagate  the  distinguished  dimension  forward  along  the  edges 
originated  from  node  v  to  all  the  nodes  at  the  terminating  end 
of  the  edges. 

7.2  If  the  node  to  which  a  distinguished  dimension  is  propagated 
does  not  belong  to  Ci  then  do  not  further  propagating  the 
distinguished  dimension  from  this  node  forwards. 

7.3  If  propagation  is  not  possible  to  any  node  in  Ci  because  of  type 
4  subscript  expression  then  the  current  iteration  may  be 
terminated,  i.e.  go  to  end  of  step  9. 

8.  The  current  component  can  be  merged  into  the  loop  scope.  Set 
C0MPS(i)-'l'B. 

9 .  Search  through  the  list  pointed  by  CEDGES( i ) .  For  every  edge 
from  Ci  to  Ck  set  PREDCNT(k)-PREDCNT(k)-l.  If  PREDCMT(k)«0, 
INCMP(k)-'l'B,  and  MERGCMP(k)*'l'B,  then  put  Oc  into  candidate 
list. 
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Algorithm  6.7  EVALUATE 

Function >  Given  a  loop  scope,  compute  the  resulting  penalty  in  use  of 
memory.  This  procedure  is  called  after  each  trial  schedule  for 
a  range  candidate  and  again  after  the  final  schedule  was 
selected. 

Input. 

COMPSs  A  bit  vector  of  size  COMP_CNT  with  the  bits  corresponding  to 
components  in  a  loop  scope  equal  to  'l'B. 

EVAL_SETi  A  bit  denoting  Whether  EVALUATE  is  called  to  evaluate 
memory  penalty  of  a  trial  schedule  or  for  the  selected  schedule, 
in  which  case  the  selected  memory  allocations  are  recorded  in 
STOTYP. 

Output. 

PENALTY!  The  memory  penalty  of  the  loop  scope,  in  bytes. 

Data  structure. 

SRCPHY,  TGTPHYt  When  an  edge  in  am  Array  Graph  crosses  a  boundary  of 
a  loop  scope  then,  depending  on  the  type  of  the  edge,  the  memory 
allocation  for  the  data  node  at  the  origin  or  terminating  ends 
of  the  edge  may  have  to  be  physical.  The  SRCPHY  bit  vector 
denotes  for  each  type  of  edge  (  there  are  28  types)  Whether  the 
memory  allocated  to  the  node  at  the  origin  end  of  the  edge  (the 
source  node)  must  be  physical.  Similarly,  the  TGTPHY  vector 
refers  to  the  node  at  the  terminating  end  of  the  edge  (the 
target  node). 

MRAL:  The  memory  requirement,  in  bytes,  after  the  loop  is  formed. 
MRICi  The  memory  requirement  in  the  ideal  case. 

STOTYP i  A  field  in  the  data  structure  LOCAL_SUB.  For  a  virtual 
dimension,  STOTYP-O.  For  a  window  of  width  X+l  dimension, 
STOTYP-k+1 .  For  a  physical  dimension  with  upper  bound  u, 
STOTYP— u. 

1.  Repeat  steps  2  to  6  for  every  edge  in  the  Array  Graph.  Each 
iteration  computes  the  effect  of  the  edge  on  use  of  memory. 

2.  If  the  source  and  the  target  nodes  of  the  edge  are  in  COMPS,  this  is 
an  internal  edge,  then  go  to  step  6  to  examine  the  subscript 
expression  of  the  edge  to  determine  its  effect  on  use  of  memory. 

3.  If  both  the  source  and  the  target  nodes  of  the  edge  are  not  in 
COMPS,  then  this  edge  has  no  effect  on  memory  useage .  Go  to  end  of 
iteration,  at  end  of  step  6. 

4.  If  none  of  the  above  then  this  edge  crosses  the  loop  boundary.  In 
this  case,  if  SRCPKY( EDGE_TYPE )“1 ,  then  the  distinguished  dimension 
of  the  source  node  must  be  physical .  if  TGTPHY(EDGE_TYPE)»1,  then 
the  distinguished  dimension  of  the  target  node  must  be  physical. 
The  respective  node  numbers  and  the  requirements  for  physical  memory 
allocation  are  stored  in  a  list.  Also  in  this  case  go  to  the  end  of 
the  iteration  (at  end  of  step  5). 

5.  If  the  subscript  expression  is  of  the  form  I-X  and 
SRCPHY( EDGE_TYPE )«l,  then  the  memory  allocation  for  the 
distinguished  dimension  of  the  source  node  must  be  a  window  of  width 
X+l.  This  is  also  stored  in  the  list. 

6.  PENALTY  is  initialized  to  zero. 

7.  Repeat  steps  8  to  11  for  every  node  in  the  above  list.  ‘Rmss  nodes 
have  either  a  physical  or  window  of  width  k+1  memory  allocation.  An 
iteration  computes  the  memory  requirement  for  a  respective  node. 

8.  In  the  case  of  a  physical  distinguished  dimension,  compute  MRAL,  as 
the  product  of  all  the  ranges  of  the  unscheduled  node  subscripts. 
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Zn  the  case  of  a  window  of  width  k+1  for  the  distinguished 
distension,  compute  MRAL  as  the  product  of  k+l  and  the  ranges  of  the 
other  unscheduled  node  subscripts. 

9.  To  compute  MRIC  it  is  necessary  to  scan  each  unscheduled  node 
subscript.  If  its  storage  type  STOTYP  is  0,  then  the  ideal  meeory 
requirement  for  this  dimension  is  one.  Zf  STOTYP <0,  the  memory 
allocation  has  previously  been  determined  as  physical,  then  the 
ideal  memory  requirement  is  -STOTYP  (u).  MRIC  is  the  product  of 
these  ideal  ranges. 

10.  the  penalty  for  the  array  node  ND_PENALTY-  ( MRAL-MRXC ) *(  length  of 
node  element  in  bytes). 

11.  PENAIVTY— PEHAIjTY+ND_PENALTY . 

12.  Zf  EVAI«_SET- ' 1 ' B  then  if  the  distinguished  dimension  is  physical 
then  STOTYP  in  every  unscheduled  dimension  is  equal  to  the  minus  of 
its  range,  if  the  distinguished  dimension  is  a  window  of  width  k+1 
then  STOTYP  of  the  distinguished  dimension  is  k+1  amd  for  the  other 
unscheduled  dimensions  STOTYP  is  the  minus  of  their  respective 
range . 

Algorithm  6.8  EXTRACT 

Functions  To  obtain  the  for-element  for  a  loop,  including  the  schedule 
elements  for  the  body  of  the  loop  scope. 

Input. 

SUBGRAPH:  A  pointer  to  a  reduced  Array  Graph  of  the  component 
scheduled  into  one  loop  scope. 

SVPOSITION:  A  vector  with  an  element  for  every  node  in  the  SUBGRAPH. 
Each  element  haus  the  value  of  the  dimension  number  of  the 
distinguished  dimension  of  the  respective  node. 

L  :  The  nesting  level. 

Output. 

A  for-element  which  is  the  schedule  of  the  input  graph. 

1.  Allocate  a  for-element.  Set  FOR_NAME  to  loop  parameter  name  and 
POR_RANGE  to  the  range  set  number  of  the  loop  parameter. 

2.  Zf  the  current  loop  range  has  some  immediate  subranges,  then  call 
procedure  C0ND_GRAPH  and  upon  return  go  to  step  7.  C0ND_GRAPH  takes 
over  all  further  scheduling  of  a  body  of  a  loop  which  contains 
conditionally  executable  nodes  due  to  use  of  indirect  subscripting. 

3.  Delete  all  the  edges  from  the  graph  with  distinguished  dimension 
subscript  expressions  of  type  2  or  3.  The  precedence  expressed  by 
these  edges  is  assured  by  the  order  of  the  iterations. 

4.  set  IDM1TH  of  the  distinguished  dimension  of  all  the  nodes  in  the 
subgraph  to  L,  the  nesting  level  of  the  current  loop. 

5.  Call  SCHEDULE_GRAPH,  with  SUBGRAPH  and  L+l  as  the  parameters,  to  get 
the  schedule  of  the  resulting  graph. 

6.  Set  EIMfT_X<IST  in  the  for-element  structure  to  point  to  the  schedule 
returned  from  step  5. 

7.  Return  the  for-element  as  output. 


Algorithm  6.9  COHD_GRAPH(TOP_RANGE, GRAPH) 

Function  t  To  obtain  the  schedule  elements  of  the  body  of  a  loop  scope, 
which  includes  cond-e laments . 

Input. 

TOP_RANGBt  The  range  set  number  of  the  highest  level  major  range  in 
the  SGRAPH. 

SGRAPH:  A  graph  to  be  scheduled  within  an  Iteration  block  of  the 


rang*  TOP_ RANGE. 

Output,  A  schedule  Cor  SGRAPH. 

1.  Scan  aLll  *dg*a  in  SGRAPH.  If  an  edge  has  a  subscript  expression  in 
the  distinguished  dimension  of  types  2,  3,  6,  or  7,  and  either  the 
source  or  the  target  nodes  have  the  TOP.RANGE  range,  then  delete 
this  edge  f roe  SGRAPH. 

2.  If  node  X  is  the  indirect  indexing  vector  served  to  reduce  the  range 
TOP_RANGE  to  a  subrange  RNGIK,  then  draw  an  edge  from  X  to  all  the 
nodes  in  the  range  set  of  RNGIK. 

3.  Call  procedure  STRONG  to  form  a  Component  Graph  for  SGRAPH, 
consisting  of  ACOMP  and  INCMP,  CEDGES,  and  NODELST.  ACOKP  and  INCMP 
are  bit  vectors  (  the  size  is  the  number  of  MSCC  found  by  STRONG). 
These  vectors  are  all  of  the  value  'l'B. 

4.  For  every  subrange  RNGIK  of  TOP_RANGE,  merge  all  the  components  in 

the  range  sets  of  RNGIK  or  its  direct  and  indirect  subranges  into 
one  component .  Set  the  INCMP  vector  elements  of  the  merged 

components  to  'O'B. 

5.  Repeat  steps  6  to  9  until  all  the  elements  in  INCMP  are  'O'B.  Each 
iteration  merges  a  group  of  components  with  TOP_RANGE  range. 

6.  Call  CLOSURE  with  INCMP  to  obtain  the  closure  set  MERGEL.CMP. 

7.  CALL  MAXJSCHED  with  INCMP,  MERGE_CMP,  and  TOP.RANGE.  It  returns 
COOMPS. 

8.  Merge  the  components  in  CCOMPS  into  one  component,  updating  NODELST, 
CEDGES,  ACOMP,  and  INCMP. 

9.  Set  the  element  of  INCMP  corresponding  to  the  merged  schedule  to 
•O'B. 

10.  Repeat  steps  12  to  13  for  the  components  in  ACOMP. 

11.  Select  the  next  component  in  ACOMP  in  a  topologically  sorted  order. 
Let  this  component  be  COMPI. 

12.  Let  RNGIK  be  the  range  of  the  component  COMPI.  If  RNGIK-T0P_RANGE, 
then  mark  the  distinguished  dimension  of  each  node  in  the  component 
as  scheduled  and  call  procedure  SCHEDUIE_GRAPH  to  get  a  schedule  for 
this  component.  Go  to  step  14. 

13.  Otherwise,  allocate  a  cond-e lament  to  this  component.  Call 
procedure  OOND_GRAPH  recursively  with  RNGIK  and  COMPI  as  the  input 
parameters  to  get  a  schedule  for  the  conditional  element. 

14.  Return  the  schedule  elements  obtained  as  the  final  schedule  of 
SGRAPH.  Note  that  the  order  of  the  schedule  elements  was  determined 
by  the  selection  of  components  in  a  topologically  sorted  order  in 
step  11.  The  schedule  elements  are  obtained  either  in  step  12  or 
13,  depending  on  Whether  they  are  cond-e laments  or  other  elements 
respectively. 
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CHAPTER  7 
CODE  GENERATION 


7.1  OVERVIEW  OP  THE  CODE  GENERATION  PROCESS 

Cod*  Generation  is  the  last  phase  of  the  processor.  It  uses  the 
data  structure  generated  in  Array  Graph  construction,  specification 
analysis,  and  program  scheduling.  As  shown  in  Fig.  7.1  the  code 
generation  process  accepts  two  inputs i  the  program  schedule  created  in 
the  scheduling  phase  and  attribute  tables  produced  in  the  analysis 
phase.  Recall  that  the  program  schedule  is  an  ordered  sequence  of 
schedule  elements  described  in  section  6.6.  The  nodes  referenced  in 
schedule  elements  can  be  found  in  the  dictionary.  The  attributes  of  the 
respective  nodes  are  in  the  dictionary.  They  are  described  in  the 
section  4.2.1.  The  output  is  a  complete  PVI  program  ready  for 
compilation.  The  executable  PL/I  code  is  written  out  to  the  "punt" 
file.  The  PL/I  "ON**  conditions  are  written  to  the  "PLlON"  file  and  the 
PL/I  code  for  declaring  the  object  data  items  is  written  to  a  "PL1DCL" 
file. 


Program 
Schedule  ^ 


Attribute 

Tables 


Pig.  7.1  Overview  of  the  Code  Generation  Phase 


Pig.  7.2  shows  the  overall  organisation  of  the  code  generation 
process,  consisting  of  the  main  procedure  CODEGEN  which  in  turn  calls  on 
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other  procedures  to  perform  certain  tasks.  The  Pl/I  execution  code 
generated  by  the  GENERATE  procedure  which  examines  the  elements  of 
schedule  one  at  a  time,  and  invokes  the  procedures  that  are 
icated  by  types  of  program  events.  The  GPL1DCL  procedure  generates 
data  declarations.  GENERATE  calls  GEN_NODE  to  generate  statement 
for  node  elements  of  the  schedule.  The  gen_node  calls  on  GENIOCD  for 
input-output  operations  and  on  GENASSR  for  assertions.  GENERATE  also 

calls  GENDO  and  GENEND  for  generating  iteration  control  structures  for 
for-elements,  and  on  COND.BUC  and  COND_END  for  generating  conditional 
block  statements  for  cond-e  laments.  These  procedures  are  briefly 
reviewed  in  section  7.2.  They  are  described  in  greater  detail  together 
with  other  auxiliary  tasks  in  the  subsequent  sections  that  follow. 


7.2  THE  MAJOR  PROCEDURES  FOR  CODE  GENERATION 


7.2.1  CODEGEN  -  THE  MAIN  PROCEDURE 

CODEGEN  starts  with  opening  the  output  files  PL1EX,  PLION,  and 
PL1DCL.  it  next  generates  code  that  will  handle  program  errors.  Most 
of  these  errors  are  due  to  input  data  errors  discovered  by  data  type 
conversions  in  the  program.  The  user  can  also  define  additional  error 
conditions,  the  statements  written  to  the  PL1EX  file  are  as  follows: 
ALLOCATE  ERROR,  ACC.ERROR  ; 

ACC_ERROR  -  ' 0 ' B  ; 

ALLOCATE  $ERR_LAB  ; 

$ERR_LAB  -  END_PROGRAM  > 

the  declarations  written  to  the  PL1DCL  file  are  as  follows: 

DCL  (ERROR,  ACC_ERR,  NOT_DONE)  CTL  BIT( 1 )  ; 

DCL  SERR.LAB  LABEL  CTL  ; 

Finally  the  ON  condition  code  is  sent  to  the  PLION  file  as  follows: 

ON  ERROR 
BEGIN 

/*  write  erronous  input  record  to  ERRORF  file  */ 

WRITE  FILE( ERRORF)  FROM( $ERROR_BUF )  ; 

error  -  *1'B  ;  /*  set  error  flag  V 

GO  TO  3ERR_LAB  ;  /*  go  to  end  of  loop  where  */ 

END  ;  /*  error  was  detected  */ 

ERROR_RESTART: 

CODEGEN  next  passes  the  entire  program  schedule  to  GENERATE,  which 
will  generate  the  portions  of  the  program  for  the  schedule  elements. 
When  this  is  completed  CODEGEN  passes  the  attribute  tables  to  GPL1DCL  to 
generate  data  declarations.  Finally  CODEGEN  calls  on  MERGEPLl  to  merge 
the  three  output  files. 


7.2.2  GENERATE  -  INTERPRETING  SCHEDULE  ELEMENTS 

This  recursive  procedure  scams  the  schedule  given  by  the  list  of 
schedule  elements,  LIST,  for  a  loop  nesting  level  LEVEL.  To  start  with, 
CODEGEN  passes  the  whole  schedule  at  level  0.  In  subsequent  calls 
GENERATE  will  receive  a  schedule  of  a  loop  scope  at  each  nesting  level. 
GENERATE  calls  lower  level  procedures  to  process  the  different  types  of 
schedule  elements  as  follows: 

1.  Scan  each  element  of  the  list  LIST.  For  each  element  perform  steps  2 
to  4. 

2.  If  the  element  is  a  node-element  call  GEN_NODE  which  will  generate 
the  code  for  the  schedule  element. 

3.  If  the  element  is  a  for-element  do  the  following: 

3.1  Call  GENDO  to  produce  a  code  for  opening  a  loop. 

3.2  Call  GENERATE  recursively  with  the  list  of  the  elements  within 
the  loop’s  scope  and  level  -  LEVEL+1. 

3.3  Call  GENEND  to  generate  the  termination  of  the  loop. 

4.  If  the  element  is  a  cond-e lament  do  the  following: 

4.1  Call  COND_BUC  to  produce  the  code  for  opening  a  conditional 
blodk. 
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4.2  Call  GENERATE  recursively  with  the  list  of  the  elements  within 
the  condition  block  and  level  -*  LEVEL. 

4.3  Call  COND_END  to  generate  the  termination  of  the  conditional 
block. 


7.2.3  GENDO  -  TO  INITIATE  THE  SCOPE  OP  ITERATIONS 

This  procedure  produces  the  code  for  a  control  statement  initiating 
an  iteration  loop.  The  loop  variable  name  FORNAME  and  the  termination 
criterion  are  taken  from  the  fields  FOR_NAME  and  FOR_RANGE  in  the 
for-e lament  being  scanned. 

The  following  instructions  are  intended  for  recovery  from  a  program 
error.  They  always  precede  each  loop  control  statement: 

ALLOCATE  ERROR,  ACC_ERR0R  ; 

/*  reset  accumulative  error  flag  */ 

ACCJSRROR  -  '  0 '  B  , 

ALLOCATE  3ERR_LAB  ; 

SERR.LAB  -  LOOP_ENDc  ; 

The  "c"  following  LOOP_END  is  a  unique  number  assigned  to  the  loop.  The 
purpose  of  these  statements  is  to  ensure  that  an  error  occurring  within 
the  loop  scope  will  cause  the  control  be  directed  to  LOOP_ENDc  which  is 
a  label  inraed lately  proceeding  the  end  of  the  loop. 

The  DO-statement  itself  is  constructed  next.  Two  basic  forms  for 
the  loop  control  statements  are  used: 

1) 

00  name  »  1  TO  upper  (  WHILE  (condition)  ]  ; 

2) 

name  -  0  ; 

DO  WHILE  (condition)  ? 
name  -  name+1  > 

"name"  is  the  loop  variable,  "condition”  is  the  termination  condition. 

If  the  termination  criterion  given  is  that  of  a  fixed  upper  limit 
or  given  through  a  SIZE  variable,  the  first  form  is  used  and  "upper”  is 
either  a  constant  number  or  a  variable  of  the  form  SIZE3X. 

If  the  range  is  specified  by  an  END.X  control  variable,  the  second 
form  of  loop  control  is  used.  In  this  case  we  use  N0T_D0NE  in  the 
condition  and  the  following  statements  are  generated  before  the 
beginning  of  the  loop: 

ALLOCATE  NOT.DONE  ; 

N0T_D0NE  -  *1'B  j 

NOTJDONE  will  be  reset  to  'O'B  whenever  the  appropriate  END.X  variable 
is  set  to  'true' . 

If  there  is  an  end-of-file  condition  associated  with  tits  iteration, 
either  as  the  main  termination  condition,  or  because  this  is  an 
iteration  on  an  input  record  or  group  above  the  record  level  which  are 
last  in  their  peer  group,  we  add: 

-ENDFILEdfile 
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to  the  condition  "condition 


i 


7.2.4  GENEND  -  TO  TERMINATE  THE  SCOPE  OP  ITERATIONS 

This  procedure  produces  the  code  needed  at  the  end  o£  the  loop 
scope.  Since  at  times,  we  use  k+1  locations  to  store  a  window  of  size 
k+1  of  an  array,  it  is  necessary  on  each  iteration  to  shift  the  window 
by  one  element  position.  This  is  done  at  the  end  of  the  iteration.  The 
size  of  respective  window  is  originally  stored  in  STOTYP  of  the  node 
subscript  of  each  array  node.  GENERATE  passes  the  node  numbers  of 
arrays  using  window  dimensions  in  a  list  called  PREDLIST  to  GEN_END. 
Based  on  this  list  GEN_END  generates  statements  to  shift  the  window  by 
one  element  position.  The  actual  range  declared  for  a  window  dimension 
is  k+1.  In  each  iteration  we  compute  (or  read)  A<  . . . ,  k+1,  . . . )  and  may 
refer  to  the  previous  element  as  A( . . . ,  k,  . . .  ) .  When  an  iteration  is 
completed  we  transfer  A( . . . ,  i+l, . . . )  to  A(  .  . . ,  I,...)  for  I  from  1  to 
k. 

After  producing  a  sequence  of  these  shifting  operations  we  produce 
the  labelt 

LOOP_ENDc t  ; 

where  "c"  is  the  unique  count  associated  with  the  current  loop.  If  the 
termination  criterion  for  the  loop  was  through  an  END.X  control  variable 
we  also  produce  the  code: 

IF  END.X  THEN  NOT.DONE  -  'O'B  ; 

This  has  to  be  done  at  the  end  of  the  loop  since  the  value  of  END.X  at  a 
given  iteration  determines  whether  this  iteration  will  be  the  last. 

After  this  we  produce  the  following  statements; 

$TMP_ERROR  -  ACCLERROR  ; 

PREE  ERROR,  ACC_ERROR  j 

FREE  $ERR_LAB  ; 

IP  J$7MP_ERR0R  THEN  ERROR,  ACCLERROR  -  ’  l’B  > 

If  the  termination  criterion  was  through  an  END.X  control  variable 
we  also  produce: 

FREE  NOT_DONE  ; 


7.2.S  COND_BLK  -  INITIATE  A  CONDITIONAL  BLOCK 

This  procedure  produces  the  code  necessary  to  initiate  a 
conditional  block.  The  conditional  block  will  be  executed  within  the 
iteration  only  when  the  value  of  the  indirect  subscript  is  increased. 
The  indirect  subscript  node  number  is  stored  in  the  FOR^RANGE  field  of 
the  co nd-e lament  being  scanned.  An  IF-statement  is  generated  to  test 
the  above  condition.  Inside  the  conditional  block  we  will  use  a  new 
symbol  for  the  indirect  subscript.  For  example,  if  X(I)  is  the  indirect 
subscript  then  we  define  a  new  subscript  J«X( I ) .  Let  ’old-sub’  denote 
the  subscript  running  in  the  major  range,  i.e.  I.  The  ' new- sub' 
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denotes  the  new  representation  of  the  indirect  subscript,  i.e.  J.  A 
boolean  variable,  $B_X,  indicates  Whether  the  conditional  block  should 
be  executed.  The  code  to  compute  3B_X  is  generated  by  GEN_NODE  when  the 
node  X  is  scanned  in  the  schedule.  The  new-sub  is  of  the  form  $Xn  Where 
•n*  is  a  unique  number  associated  with  this  conditional  block.  The 
following  declaration  statements  are  issued: 

DCL  $Xn  FIXED  BIN  ; 

DCL  $B_X  BIT(l)  » 

The  following  codes  is  then  produced: 

IF  $B  _X  THEN  DO  t 

new-sub  »  X(  . . . ,  old-sub )  ; 


7.2.6  COND_END  -  TERMINATE  A  CONDITIONAL  BLOCK 

This  procedure  produces  the  code  at  the  end  of  a  conditional  block. 
The  above  IF-statement  has  been  generated  by  C0ND_BIK.  Here  we  issue  an 
'END'  statement  to  terminate  the  IF-statement . 


7.3  GENJNODE  -  CODE  GENERATION  FOR  A  NODE 

This  procedure  generates  the  code  associated  with  a  schedule 
node-element.  It  branches  to  different  parts  according  to  the  types  of 
nodes. 


7.3.1  PROGRAM  HEADING 


If  the  node  is  a  module  name  (type 
name:  PROCEDURE  OPTIONS( MAIN ) 
This  code  is  routed  to  the  file  PL1DCL. 


MODL)  we  produce  the  code: 
; 


7.3.2  FILES 

If  the  node  is  a  file  node  (type  FILE)  we  first  generate  three 
names.  "file_stem"  is  the  file  name  with  prefixes  "NEW"  or  "OLD" 
removed,  if  any .  "name"  is  the  full  name  of  the  node,  including  all 
prefixes.  "file_auf f "  is  the  file.stem  with  the  suffix  of  'S'  for 
source  file,  'T*  for  target  file,  and  'O'  for  update  file  (both  source 
and  target).  The  following  declaration  statements  are  routed  to  PL1DCL 
file. 

DCL  name.S  CHAR( length)  VARYING  INIT( '  ')  ; 

DCL  nams.INDX  FIXED  BIN  ; 

"length”  is  the  mamimua  length  of  records  in  the  file.  "name_S"  is  the 
name  of  a  buffer  into  which  records  in  the  file  are  read.  (It  is 
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VARYING  as  the  file  nay  have  more  than  one  record  type,  with  different 
lengths . )  "name_INDX"  is  a  variable  used  to  scan  the  buffer  for  packing 
and  unpacking  the  records  (explained  further  later). 

1.  If  the  file  is  an  input  file  we  produce  the  statement: 

OPEN  PILE  ( file_suff )  j 

2.  If  the  file  is  a  sequential  input  file  and  an  end-of-file  is  not 
explicitly  mentioned  by  the  user,  we  produce  the  declarations: 

DCL  ENDFILESf ile_3tem  BIT(l)  INIT('0*B)  ; 

DCL  $FSTfile_Suff  BIT(l)  INIT('1*B)  > 
routed  to  PL1DCL  file.  If  the  user  explicitly  mentioned  the 
end-of-file  variable  then  these  statements  will  be  generated  when  the 
declaration  are  generated  for  all  variables  by  GPL1DCL. 

The  statements: 

ON  ENDFILE  (file_suff) 

BEGIN 

ENDPILE3 f ile_stem  -  ■ 1*B  ; 
name_S  -  COPY( '  ' , length )  { 

END  ; 

are  sent  to  PLION  file.  The  purpose  of  these  statements  is  to  have 
the  file  buffer  filled  with  blank  characters  when  an  end  of  file 
condition  occurs. 

3.  If  the  file  is  an  output  file  we  produce  the  statement: 

CLOSE  FILE( file_suff )  ; 


7.3.3  RECORDS 

If  the  node  is  a  record  (type  RECD)  we  call  GENIOCD  to  produce  the 
code  for  the  reading  or  writing  of  records. 


7.3.4  FIELDS 

To  process  fields  GEN_NODE  calls  procedure  GENITEM.  gen_node  also 
calls  CHECK_VIRT  to  find  if  the  node  has  a  windowed  dimension.  If  the 
field  node  is  an  indirect  subscript,  X,  the  following  code  is  issued. 

IF  loop_var»l  THEN  DO  ; 

bname  -  '  1 '  B;  mane  m  0;  END  ; 

ELSE  IF  X( loop_var)»X( loop_var-l)  THEN  DO  ; 
bname  -  * l'Bj  mame  -  0;  END  ; 

ELSE  DO  ; 

bname  -  ’O'Bj  mame  -  1;  END  ; 

where  loop_var  is  the  current  level  loop  variable,  bname  is  of  the  form 
9B_X,  and  mame  is  of  the  form  3R_X.  Recall  that  bname  indicates 
Whether  the  associated  conditional  block  will  be  executed.  mame  will 
be  used  to  compute  the  index  to  reference  am  element  such  as 
A(X(  loop-var))  in  the  case  that  array  A  has  a  windowed  dimension.  This 
is  explained  further  later  in  connection  with  the  code  generation  for 
assertions . 
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7.3.5  ASSERTIONS 


If  the  node  is  an  assertion 
produce  the  code  for  an  assertion. 


call  the  procedure  GENASSR  to 


7.4  GENASSR  -  GENERATING  CODE  FOR  ASSERTIONS 


This  procedure  generates  code  for  assertions.  The  nain  task  of 
GENASSR  is  to  transform  the  syntax  tree  representation  of  the  assertion 
into  a  string  representation  acceptable  by  the  Pl/I  compiler.  The 
transformation  is  carried  out  by  a  recursive  climb  on  the  syntax  tree, 
combining  for  each  node  the  string  representations  of  the  descendant 
subtrees  into  a  string  representation  of  the  tree  rooted  at  that  node. 
However,  before  performing  the  main  task  the  procedure  transforms 
assertions  containing  conditional  expressions  into  conditional 
assertions.  Thus,  am  assertion  of  the  form! 

Y  -  IF  (IF  X>0  THEN  Y>0  ELSE  Y<-0)  THEN  X*Y 

ELSE  -X*Y  i 

will  be  transformed  into: 


IF  X>0  THEN  IF  Y»0  THEN  Y  -  X*Y  } 

ELSE  Y  -  — X*Y  ; 
ELSE  IF  Y<=0  THEN  Y  -  X*Y  > 
ELSE  Y  -  — X*Y  ; 


The  overall  execution  of  GENASSR  can  therefore  be  sunnarily 
described  as: 

1.  Transform  assertions  with  conditional  expressions  into  conditional 
assertions . 

2 .  Form  the  string  representation  of  the  assertion. 


7.4.1  TRANSFORMING  CONDITIONAL  EXPRESSIONS 

This  task  is  carried  out  by  the  procedure  SCAN  which  uses  the 
auxiliary  procedure  EXTRACT_COND . 


7. 4. 1.1  SCAN  (IN) 

The  procedure  SCAN  effects  the  complete  trams format ion  of 
assertions  containing  conditional  expressions  into  conditional 
assertions.  The  procedure  is  presented  with  an  assertion  pointed  to  by 
IN,  amd  returns  a  pointer  to  the  transformed  assertion.  The  steps  in 
this  procedure  are  as  follows: 

1.  Check  the  root  of  the  tree  pointed  to  by  IN  to  see  whether  it  is  a 
simple  assertion  or  a  conditional  assertion.  If  it  is  a  simple 
assertion  then  go  to  step  5. 

2.  We  check  next  if  the  conditional  assertion  contains  conditional. 
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expressions .  a  conditional  assertion  has  the  font 
IF  COM3  THEM  SI  ELSE  S2 

Where  SI,  S2  are  assertions. 

SCAM  calls  EXTRACT_COND  to  check  whether  COND  contains  a  conditional 
expression.  If  COND  contains  a  conditional  expression,  then 
EXTRACT_COND  returns  C,  L,  and  R  Which  are  the  parts  of  COND  as 
follows  : 

COND  -I F  C  THEN  L  ELSE  R. 

Otherwise,  go  to  step  4. 

3.  If  a  conditional  expression  is  found  in  COND  then: 

3.1  SCAN  then  transforms  the  tree  (pointed  to  by  IN)  into  a  tree  INI 
Which  consists  of  the  form: 

IP  C  THEN  IF  L  THEN  SI 
ELSE  S2 
ELSE  IF  R  THEN  SI 
ELSE  S2 

3.2  SCAN  calls  SCAN(INl)  recursively  to  further  search  for 
conditional  expressions  in  INI  and  return  a  transformed 
conditional  assertion. 

3.3  The  transformed  assertion  is  returned  by  SCAN. 

4.  If  COND  does  not  contain  embedded  conditional  expressions,  then  there 
are  two  recursive  calls  to  SCAN  for  the  assertions  Si  and  S2  in  IN. 
SCAN  then  returns  the  following  assertion  and  exits. 

IF  COND  THEN  SCAN(Sl)  ELSE  SCAN(S2) 

5.  In  the  case  of  a  simple  assertion: 

Y  -  E. 

SCAN  calls  EXTRACT_COND( E )  to  search  for  conditional  expressions  in 
E.  If  none  found,  then  assertion  Y  *  E  is  returned  unchanged, 
otherwise,  EXTRACT_COND  returns  C,  L,  and  R  which  are  the  parts  of  E 
as  follows: 

E  »  IF  C  THEN  L  ELSE  R. 

6.  If  E  contains  conditional  expression,  then  scan  calls  SCAN(IN2) 
recursively.  Where  IN2  points  to  a  tree  of  an  expression  of  the  form: 

•IF  C  THEN  Y  »  L 
ELSE  Y  »  R' 

The  return  from  the  recursive  call  on  SCAN  is  returned  by  scan  as  the 
transformed  assertion. 


7. 4. 1.2  EXTRACT_COND( ROOT, COND, LEFT, RIGHT) 

This  procedure  identifies  and  extracts  the  leftmost  conditional 
expression  in  a  given  expression  pointed  to  by  ROOT. 

If  a  conditional  expression  is  found  the  (pointer  to  the)  condition 
is  returned  in  COND  and  its  first  (THEN)  and  second  (ELSE) 
subexpressions  returned  in  LEFT  and  RIGHT  respectively.  If  the  analyzed 
expression  contains  no  conditional  expression  the  procedure  returns  NULL 
in  COND. 

Its  operation  is  as  follows: 

1.  Inspect  the  top  level  node  of  the  given  syntax  tree. 

2.  If  it  is  a  conditional  expression,  return  respectively  the  condition. 
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the  subexpression  following  THEN,  and  the  subexpression  following 
ELSE,  then  exit. 

3.  If  the  expression  is  a  simple  expression,  i.e.  a  constant  or  a 
variable,  return  MULL  and  exit. 

4.  If  the  expression  is  a  compound  expression,  scan  each  of  its 
descendants  by  calling  EXTRACT_COND  recursively.  Consider  the  first 
COND,  LEFT,  and  RIGHT  Which  are  returned  such  that  COND  is  not  equal 
to  MULL.  In  general,  a  compound  expression  is  of  the  font 

E  -  g(El, . . .  ,Em) 

Assume  that  the  recursive  scanning  of  El,  . . . ,  On  produces  first  COND 
not  equal  to  MULL  for  Ei  where  l<**i<»ro,  returning  also  the  THEM  and 
ELSE  subexpressions  L,  and  R  respectively.  Then  the  current  call  for 
E  returns: 

COND  as  the  condition, 

g(El,  ...,Ei-l,L,  . ...Em)  as  LEFT,  and 

g( El ,  . . . , Ei-1 , R,  « . « , Em )  as  RIGHT . 

Thus  the  overall  effect  of  ECTRACT_COND  on  an  expression  E  is  to  extract 
a  condition  C  if  one  exists  in  E  (returned  as  COND),  and  then  to  compute 
El  when  C  is  true,  and  E2  when  c  is  false.  El  and  E2  are  returned  in 
LEFT  and  RIGHT  respectively.  Described  in  another  way  we  look  for  C, 
El,  and  E2  such  that  the  following  equivalence  holds: 

E  -  IF  C  THEN  El  ELSE  E2  . 

In  particular  this  gives: 

g(El . Ei-1, (IF  C  THEN  L  ELSE  R),  .  .  .Em)  - 

IF  C  THEN  g(El,  . . . ,Ei-l,L, . . . ,Em) 

ELSE  g( El,  . . . ,Ei-l,R, . . . ,Em). 


7.4.2  PRINT  -  TRANSFORMING  THE  ASSERTION  INTO  STRING  FORM 

This  procedure  is  presented  with  a  pointer  to  an  assertion  syntax 
tree  and  it  converts  the  assertion  tree  into  a  string  representation. 

The  procedure  branches  according  to  the  types  of  the  nodes  in  the 
assertion  tree. 

1.  If  the  node  is  a  subscripted  variable  A(E1, . ..,Em)  we  generate  the 
string  'A( * .  We  then  scan  each  of  the  subscript  expression  El  to  Em 
and  add  them  to  the  string  according  to  the  following  subcases: 

1.1  If  the  dimension  at  position  i  corresponds  to  the  dimension 
declared  for  repetition  of  a  record  and  the  variable  A  includes 
the  prefixed  'NE3CT',  then 

1.1.1  If  the  dimension  is  scheduled  as  a  window  of  width  k+l  we 
insert  the  subscript  value  k+2. 

1.1.2  If  the  dimension  is  scheduled  as  physical  and  the 
expression  Ei  is  a  constant  c,  then  insert  the  value  of 
c+1 .  (See  further  below.) 

1.1.3  If  the  dimension  is  scheduled  as  physical  and  Ei  is  an 
expression  we  call  PRlNT(Ei)  and  insert  the  returned  value 
concatenated  with  '+1' . 

1.2  If  the  dimension  at  position  i  is  scheduled  as  a  window  of  width 
k+1,  in  this  case  the  physical  allocation  for  the  array  dimension 
is  k+2  elements  with  the  k+lth  element  standing  for  the  current 
value  and  the  k+2th  element  standing  for  the  field  in  the  next 


-  145  - 


are  handled  as 


record,  the  different  subscript  expressions 
follows  I 

1.2.1  Zf  it  is  a  simple  subscript  then  we  insert  an  integer  k+l 
as  the  subscript. 

1.2.2  Xf  the  subscript  expression  is  I-c,  then  an  integer  k+l-c 
is  inserted. 

1.2.3  Zf  the  subscript  expression  is  X(Z),  then  k+l is 
inserted  where  k+l -9KJC  points  to  the  element  A(X(  I ) ).  Xf 
X(I)-X(I-1)  then  $R_X  is  equal  to  1,  and  if  X(I)>X(X-1) 
then  $R_X  is  equal  to  0.  (The  code  to  compute  $R_X  is 
generated  by  GEN_NODE  right  after  node  X  is  scanned.) 

1.2.4  Xf  the  subscript  expression  is  X(I)-c,  then  k+l-$H_X-c  is 
inserted  as  subscript. 

1.2.5  Xf  the  subscript  expression  is  X( I-a ) ,  then 

k-CX(Z-l)-X(I-a)]  is  inserted  as  the  subscript. 
X(I-l)-X(l-a)  is  the  offset  of  A(X(  I-a) )  to  A(X(  1-1) )  Which 
is  stored  in  the  kth  element  of  the  window  for  the  ith 
dimension  of  array  A. 

1.2.6  If  the  subscript  expression  is  X( I-a)-c,  then 
k-[X(  1-1  )-X(  I-a)]-c  is  inserted  as  the  subscript. 

1.3  Xf  the  ith  dimension  of  array  A  is  physical  and  Ei  is  the 
subscript  expression,  we  call  PRItrr(Ei)  and  insert  the  returned 
value. 

2.  For  all  other  compound  nodes  we  call  PRINT  recursively  to  convert  the 
descendants  and  insert  between  them  the  string  representation  of  the 
separators,  operators,  and  delimiters.  The  letters  are  stored  in  the 
OP_CODE  fields  as  integer  codes.  The  integer  codes  are  translated 
into  the  operator  representation  using  the  array  KEYS  and  then 
inserted. 

3.  For  atomic  nodes  we  use  the  variable  name  either  directly  or  through 
its  node  number.  Loop  variables  (subscripts)  are  accessed  through 
the  level  indication  available  in  their  IDWITH  field  Which  is  used  as 
an  index  to  the  array  LOOP_VARS.  Function  names  are  retrieved  by 
their  function  number  indexing  the  table  FCNAMES. 


7.5  GENIOCD  -  GENERATING  INPUT/OUTPUT  CODE 

GENXOCD  is  invoked  by  CODEGEN  upon  scanning  a  schedule  element 
Which  corresponds  to  a  record  node.  It  accepts  as  input  the  node  number 
in  the  schedule  element.  GENXOCD  generates  PI/X  READ,  WRITE,  or  REWRITE 
statements  with  the  appropriate  parameters,  based  on  the  attributes  of 
the  file,  as  well  as  the  control  code  or  condition  code  associated  with 
the  input/output  operation. 

Table  7.1  summarizes  the  different  statements  generated  by  GENXOCD 
for  the  different  cases.  Each  of  the  different  cases  in  Table  7.1  shows 
the  conditions  defining  the  case  and  the  statements  which  are  generated 
for  the  case.  The  upper  case  letters  represent  the  part  of  the  actual 
PL/I  string  being  generated,  whereas  the  lower  case  letters  are  the 
metanames  of  the  items  obtained  from  the  program  schedule  elements. 
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Several  preparatory  steps  are  taken  before  branching  to  the 
different  cases. 

1.  Definition  of  nasiesi  We  generate  several  variable  names  derived  froa 
the  record  name  that  will  be  used  in  the  code,  bet  the  record  nan* 
be  designated  by  rec. 

1.1  Zf  rec  is  of  the  fora  OLD.x  or  NEW.X  we  define  recnaae  as  OU>_X 
or  NENUX  respectively. 

1.2  otherwise  we  define  recnaae  as  rec. 

1.3  Recbuf  is  defined  as  recnaae.S. 

1.4  Recindx  is  defined  as  rec name. INDX. 

Consider  now  the  file  which  is  parent  to  rec.  Let  it  be  denoted  by 
fil. 

1.5  set  file.nane  to  fil. 

1.6  Zf  fil  is  of  the  fora  OLD.X  or  NEW.X  set  file.naae  to  OID_X  or 
NEW.X  respectively  and  fils.suff  to  file.namsO. 

1.7  Otherwise  set  fils.suff  to  file.naaeS  if  the  file  is  a  source 
and  to  file.naaeT  if  the  file  is  a  target. 

1.8  set  eof  to  ENDFlUSSf ile.nane . 

1.9  Retrieve  the  keynaae  associated  with  the  record,  if  one  exists, 
and  assign  it  to  key„nane. 

1.10  Set  found  to  FOUNDS  f ile.naae . 

2.  issue  the  following  declarations. 

DCL  reebuf  CHAR  (len_dat(n))  j 
DCL  recindx  FZXED  BIN  ZNZT(l)  ; 

This  declares  a  buffer  for  the  record  into  which  and  out  of  Which  the 
information  will  be  read  or  written.  •Len.dat(n) •  here  gives  the 
buffer  length. 

3.  If  the  record  is  an  output  record,  the  instruction  for  moving  the 
data  from  each  field  into  the  record  buffer  trill  be  generated. 

4.  If  the  record  is  an  output  record  and  a  SUBSET  condition  was 
specified  for  it  we  enclose  the  code  for  writing  the  record  by  the 
condition; 

ZF  SUBSETS rec  THEN  DO  ; 
code 

END  ; 

The  procedure  DO.REC  produces  the  code  for  reading  and  writing  of 
records.  It  branches  according  to  the  cases  in  Table  7.1. 
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Table  7.1  The  various  cases  of  program  I/O  control 

Case  It  An  input  Sequential  and  Monkeyed  Record. 

The  following  code  is  produced! 

IP  3PSTfile_suff  THEM  DO  ; 

READ  PILE  ( f ile_suf f )  INTO  (reCbuf)  ; 

3PSTfile_suff  -  *0*B  » 

END  ; 

ELSE  reebuf  -  filabuf  ; 
recindx  «■  1  > 

IP  AENDPILE$file_name  THEN 

READ  PILE  ( f ile_auf£ )  INTO  (filebuf)  ; 

SERROR_BUP  -  reebuf  ; 

The  movement  of  the  data  to  the  individual  fields  will  be  done  in 
conjunction  with  the  nodes  corresponding  to  the  fields  (see 
GENITEM).  The  next  record  is  always  read  into  file  buffer  so  that 
we  can  unpack  the  data  for  the  NEXT  record. 

Case  2t  Input,  Sequential  and  Keyed  Record. 

Ensure  that  the  following  reclamations  have  been  issued t 
DCL  FOUNDS rec  BIT(l)  ; 

DCL  PASSEDSrec  BIT(l)  ; 

Issue  now  the  codes 

POUNDS rec,  PASSEDSrec  *  'O'B  > 

DO  WHII£( ~ENDPIUE$f: ile_name  £  * PASSEDSrec )  ; 

READ  FILE  (file_suff)  INTO  (reebuf)  ; 

(code  for  extracting  the  key  field) 

IP  keyname  -  POINTERS rec  THEN 
FOUNDS rec.  PASSEDSrec  -  'l'B  , 

ELSE  IP  keyname  >  POINTERS rec  THEN 
PASSEDSrec  -  'l'B  ; 

END  ; 

recindx  «  1  ; 

Case  3 >  Input,  Nonsequential  (ISAM),  Keyed  record. 

Verify  that  the  declaration 
DCL  FOUNDS rec  BIT(l)  > 
has  been  issued.  Then  issue  the  codet 
FOUNDS rec  -  'l'B  j 

ON  KEY  ( file_suff )  POUNDS rec  -  'O'B  > 

READ  PIUE(file_suff )  INTO( reebuf) 

KEY( POINTERS  rec )  , 

recindx  -  1  ; 

Case  4t  Output,  Sequential  Record. 

Issue  the  following  codet 
recindx  »  1  j 

Call  PACK  procedure  to  pack  its  fields  into  the  record  buffer.  Then 
issue  the  codet 

WRITE  FILE( file_suff )  PROM( reCbuf)  > 

Case  5 1  Output,  Nonsequential,  Keyed  and  an  Update  Record  (both  NEW  and 
OLD  specified) 

Issue  the  following  codet 
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recindx  -  1  ; 

Call  PACK  procedure  to  pack  its  fields  into  the  record  buffer.  Then 
issue  the  code  s 

REWRITE  FILE( f ile_suf f )  FROK(reCbuf) 

KEY( POINTERS rec )  j 

Case  6 i  output.  Nonsequential  and  Keyed  Record. 

issue  the  following  code: 
recindx  -  1  > 

call  PACK  procedure  to  pack  its  fields  into  the  record  buffer.  Then 
issue  the  code: 

WRITE  FXI£( f ile_suf f )  FROM( recbuf ) 

KEY( PO INTERS  rec )  > 


7.6  PACKING  AND  UNPACKING 

After  a  record  is  read  we  unpack  its  fields  from  the  record  buffer 
and  place  them  in  the  respective  declared  structures.  Similarly  before 
a  record  is  written  we  pack  its  fields  into  the  record  buffer.  The  data 
movement  is  performed  by  individual  transfers  of  fields.  The  transfer 
statements  may  be  interleaved  with  other  statements  which  control  the 
iteration  over  respective  fields'  dimensions.  The  transfer  instructions 
for  unpacking  are  generated  elsewhere,  in  conjunction  with  the  schedule 
elements  associated  with  the  input  field  nodes.  The  code  for  packing  an 
output  record  is  generated  in  GENXOCD  and  inserted  right  before  the 
record  buffer  is  to  be  written  out. 


7.6.1  PACK  -  PACKING  THE  OUTPUT  FIELDS 

The  procedure  PACK  is  called  by  GENIOCD  in  the  case  of  an  output  record. 
It  accepts  a  node  number  (NODES)  as  input.  It  checks  the  type  of  the 
node  NODES.  If  the  node  is  a  field,  it  calls  DO_FID  to  generate  the 
code  for  packing.  Otherwise,  it  considers  in  turn  each  descendant  of 
the  node  NODES.  For  each  descendant  D  it  calls  PACKl(D)  recursively. 
PACKlt  This  procedure  generates  code  for  packing  a  node  which  may  or 
may  not  repeat. 

1.  If  the  node  is  a  repeating  group  or  a  field  we  get  the  termination 
criterion  of  the  repetition. 

1.1  Open  a  loop:  Call  procedure  GENDO  to  generate  the  DO-statement 
for  opening  the  loop. 

1.2  Call  the  subprocedures  PACK  to  issue  code  for  packing  a  single 
eleawnt  of  the  node. 

1.3  Call  procedure  GENEND  to  generate  the  code  for  terminating  the 
loop. 

2.  If  the  node  is  not  repeating  them 

Call  procedure  PACK  to  generate  the  code  for  packing  all  the 
constituent  members  of  this  node. 

DO_FU)t  This  procedure  is  responsible  for  producing  code  to  pack  a 
field  F  into  record  buffer.  It  uses  the  procedure  FIELDPK  to 
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generate  the  following  code. 

SUBSTR<  recbuf , recindx, lenstring )  -  p  ; 
recindx  -  recindxt lenatring  > 

PIELDPK  is  described  further  below. 


7.6.2  GENITEM  -  UNPACKING  THE  INPUT  FIELDS 

This  procedure  is  called  to  generate  code  for  unpacking  information 
from  an  input  buffer  to  an  input  field.  GEN_MODE  calls  GENITEM  upon 
scanning  a  schedule  element  of  an  input  field.  GENITEM  accepts  as  input 
the  node  number  in  the  schedule  element.  The  READ  statement  for  reading 
the  record  to  a  buffer  is  generated  by  GENIOCD  When  the  record  node  is 
scanned.  GENITEM  first  finds  for  a  record  R  the  names  of  the  input 
buffer  RS  and  the  packing  counter  RINDX.  Next,  GENITEM  calls  an 
auxiliary  procedure  PIELDPK,  which  generates  the  code  for  unpacking. 

The  GENITEM  procedure  is  as  follows t 

1.  Determine  the  name  of  the  record  containing  the  current  field.  Let 
it  be  rec.  Then  we  construct  a  buffer  names  rec_S  and  a  buffer 
index  name  rec_INDX.  Let  the  field's  name  be  in  the  variable 
"field*. 

2.  If  the  corresponding  field  in  the  next  record  is  referenced,  then 
call  PIELDPK  to  unpack  the  field  from  the  file  buffer. 

3.  Call  PIELDPK  to  generate  the  code  for  unpacking  the  field  from  the 
record  buffer. 


7.6.3  PIELDPK  -  PACKING  AND  UNPACKING  FIELDS 

The  procedure  PIELDPK  produces  the  code  for  both  the  packing  and 
unpacking  operation.  Input  parameters  axe  the  field  name,  buffer  name, 
record  index  name,  and  a  code  (CASE)  to  indicate  whether  the  field  has  a 
NEXT  prefix. 

1.  If  the  length  type  of  the  field  is  fixed,  i.e.  specified  in  the  data 
description  statements,  we  compute  its  length  directly.  If  the 
field's  type  is  'C',  'N',  or  'P',  denoting  respectively  character, 
numeric  or  picture,  we  take  the  declared  length.  Otherwise  we  will 
compute  the  length  of  the  field  in  bytes  from  its  declared  length  and 
type.  The  string  representing  the  length  is  stored  in  "lenstring". 

2.  If  the  length  of  the  field  was  declared  by  specifying  lower  and  upper 
bounds  we  Check  that  there  exists  a  control  variable  of  the  form 
LEM. field  for  this  field.  If  none  exists  we  issue  the  error  massage t 

PIELDPK t  NO  LENGTH  SPECIFICATION  FOR  THE  FIELD- field. 

3.  If  a  LEM. field  control  variable  is  found  we  sett 

lenstring  «  LEN. field 

The  byte- length  of  the  field  will  be  computed  during  run  time. 

4.  If  the  field  is  an  input  field  we  generate  the  instruction! 

UNSPEC( field )  -  SUB3TR( rec_S, recJtKDX, lenstring ) > 

If  the  same  field  in  the  next  record  is  referred  in  the 

specification,  we  will  unpack  the  file  buffer  to  get  the 


corresponding  field  in  the  next  record.  For  output  field  we 
generatet 

SUBSTR( rec_S , rec_INDX, lenstring )  «  UNSPEC( field )  ; 

Sere  "field"  is  the  nans  properly  subscripted  and  "lenstring"  is  the 
length  specification.  If  the  field  is  of  type  'C' ,  the  UNSPBC 
qualifications  will  be  omitted . 

5.  If  the  CASE  code  indicates  that  the  field  name  does  not  have  prefix 
NEXT  then  we  generate  the  following  code  to  update  the  buffer  index: 
rec_INDX  -  rec_INDX+lenstring  > 

There  is  no  need  to  update  recINDX  if  the  unpacking  is  for  a  NEXT 
prefixed  field. 


7.7  GENERATING  THE  PROGRAM  ERROR  FILE 

If  a  program  error  condition  is  induced  during  the  execution  of  the 
generated  program,  then  an  input  record,  read  during  the  iteration 
execution  when  the  program  error  was  induced  is  written  to  an  error 
file,  ERRDRF.  The  required  code  for  writing  the  bad  input  record  to  the 
error  file  is  generated  by  the  routines  CODEGEN  and  GENIOCD.  For 
example ,  the  following  VU/1  code  is  included  in  PL10N  file: 

ON  ERROR  BEGIN  ; 

WRITE  FILE(ERRORF)  FROM( SERROR.BUF )  ; 

GO  TO  3ERR.LAB  , 

END  ; 

After  the  GENIOCD  generate  the  code  to  read  a  record  from  an  input  file 
it  also  generates  a  statement  to  copy  the  input  record  into  3ERROR_BUF. 


7.8  GPL1DCL  -  GENERATING  PI/I  DECLARATION 

This  procedure  generates  the  declarations  for  the  data  nodes 
declared  by  the  user  and  those  added  by  the  system.  As  noted 
previously,  some  declarations  are  also  generated  by  other  procedures 
during  the  code  generation. 

The  main  part  of  GPL1DCL  is  as  follows: 

1.  For  each  file  F  in  the  specification  (available  from  the  list  FILIST) 
call 

DECLARE_STRUCTURE(F) 
to  declare  F  and  all  its  descendants. 

2.  For  each  node  N  in  the  specification  which  is  an  interim  variable  or 
a  control  variable,  call 

DECLARE_STRUCTURE( N } 

3.  For  eadh  subscript  Which  has  been  used,  issue  the  declaration: 

DCL  subname  FIXED  BIN  ; 
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7.8.1  DECLARE_STRUCTURE  -  DECLARING  X  STRUCTURE 

This  procedure  ia  called  by  GPL1DCL.  The  input  is  a  file  node 
number.  it  declares  the  entire  file  structure.  It  issues  the 
declaratives  DECLARE,  and  then  proceed  to  call  DCL_STR(N,1,0). 


7. 8. 1.1  DCL_STR( N,  LEVEL,  SUX) 

This  recursive  procedure  produces  a  declaring-clause  for  each  node 
N  in  the  structure.  'LEVEL'  is  the  current  level  in  the  structure.  SUX 
is  a  termination  criterion  stating  whether  there  is  a  next  node  on  the 
same  level  (younger  brother)  or  a  descendant. 

1.  Some  Preliminary  transformations  are  made  on  the  declared  node  names. 

1.1  Pile  names  of  the  form  NEW.P  and  OUD.F  are  modified  to  NEM_P  and 
OLD_P  respectively. 

1.2  The  group  names,  record  names,  or  field  names  are  reduced  to 
their  stem  (removing  prefixes). 

2.  For  control  variables  the  resulting  declaration  ias 

For  SIZE,  and  LEN  names s 
name  FIXED  BIN, 
while  for  all  other  names s 
name  BIT( 1 ) . 

3.  The  declaration  includes  in  general  the  following  items: 

LEVEL  -  The  component  level. 

Name  -  The  declared  name. 

Repetition  -  The  number  of  physical  storage  elements. 

Type  -  The  data  type. 

The  data  type  is  determined  as  follows: 

For  character  fields  -  CHAR( len )  [VARYING] 

For  numeric  fields  -  PIC  *99.... 9' 

For  picture  fields  -  Pic  'picture' 

For  fixed  binary  -  BIN  PIXED( len,  scale ) 

For  fixed  decimal  -  DEC  FIXED( len, scale) 

For  binary  floating  -  BIN  FLOAT(len) 

For  decimal  floating  -  DEC  FLQAT(  len ) 

In  the  above  'len'  is  the  specified  or  default  length  for  the  field. 
The  VARYING  option  is  taken  if  the  length  is  specified  (for  strings) 
by  a  minimal  length  and  a  maximal  length. 

Repetition  is  defined  in  STOTYP  of  the  node  subscripts  of  the 
fields.  If  an  array  dimension  is  virtual  we  omit  the  repetition 
indicator.  If  an  array  dimension  is  a  window  of  width  k+1,  the 
repetition  is  set  to  k+1.  Otherwise,  the  array  dimension  must  be  a 
physical  dimension.  The  node  subscript  list  of  the  field  node  is 
scanned,  and  the  repetition  indicators  for  array  dimensions  are 
concatenated  and  put  into  a  variable  REP.  If  R  is  not  an  empty 
string,  we  will  append  the  string  '(REP)'  after  the  declared  field 


4.  For  each  of  the  descendants  of 
DCL_STR< M, LEVEL+1, termination )  recursively. 


the 


node 


M, 


call 


7.9  CGSUM  -  CODE  GENERATION  CONCLUSION 


CGSUM  haa  the  task  of  concluding  the  code  generation  phase.  First, 
the  different  files  with  the  generated  PL/1  program  ( PL1DCL,  PL10N, 
PL1EX)  are  merged  into  one  PL/1  file  (FL1PR0G)  which  can  be  subsequently 
compiled.  Secondly,  a  Code  Generation  Suomary  Report  is  written  which 
lists  the  PL/ 1  program.  While  the  PL/1  listing  would  not  be  of  much  use 
to  the  average  MODEL  user,  it  is  of  interest  to  the  more  sophisticated 
user  and  can  serve  the  system  progranmer  for  insight  or  debugging  of  the 
MODEL  system. 
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